Log Analytics workspace storage capacity alerts not triggering despite exceeding quota limits

Our Log Analytics workspace has exceeded its daily ingestion quota multiple times over the past week, but the alert rules we configured aren’t triggering any notifications. We’re at risk of losing critical log data because we’re not being notified when the workspace approaches capacity limits. The alert rule configuration looks correct in the portal - it’s set to monitor the Usage metric with a threshold of 90% of daily cap. The action group has email and SMS configured. We’ve verified the action group works by using the Test feature. But when I check the workspace metrics, I can see we’ve hit 95%+ usage on three separate days and received zero alerts. Has anyone else experienced alerts failing to fire for workspace capacity issues? This is a production environment and we need reliable alerting.

Based on the symptoms described, I can see several issues that need to be addressed for reliable workspace capacity alerting.

Alert Rule Configuration: The ‘Usage’ metric is not appropriate for monitoring workspace capacity. You need to create a log-based alert rule instead, as metric-based alerts for workspace capacity are unreliable in az-2019 workspaces. Here’s the correct approach:

  1. Create a new alert rule of type ‘Log’ (not Metric)
  2. Use this KQL query to monitor actual ingestion:

Usage
| where IsBillable == true
| where TimeGenerated > ago(1h)
| summarize DataGB = sum(Quantity) / 1000
  1. Set the alert logic to trigger when DataGB * 24 (to project daily usage) exceeds 9 (90% of your 10GB cap)
  2. Evaluation frequency: Every 15 minutes
  3. Look-back period: 1 hour

This query-based approach is more reliable because it directly examines ingestion data rather than relying on pre-aggregated metrics.

Action Group Setup: Verify these action group configurations:

  • Ensure the action group is in the same region as your workspace (or use a global action group)
  • Check that email addresses are confirmed - unconfirmed emails silently fail
  • Review action group throttling settings: by default, email/SMS won’t repeat within 24 hours for the same alert
  • Add a webhook action to a logging endpoint so you can audit all alert firings
  • Test the action group using the ‘Test Action Group’ feature, but note this bypasses throttling rules

Workspace Usage Metric Analysis: The workspace metrics you’re viewing in the portal may show historical usage but don’t directly correlate to alert rule evaluation. Metric-based alerts use a different data path than what you see in the Metrics explorer. To verify if alerts should have fired:

  1. Go to your alert rule > View alert history
  2. Check if any alerts show ‘Fired’ status with ‘Action group execution failed’
  3. Review the action group’s Activity Log for execution attempts

Additional Recommendations:

  • Upgrade to a Pay-As-You-Go pricing tier if still on legacy per-GB pricing - it has better alerting support
  • Set up a secondary alert for ‘Operation’ logs to detect when the workspace actually stops ingestion due to cap
  • Create alerts at multiple thresholds: 75%, 90%, and 95% of daily cap
  • Use Azure Monitor Workbooks to visualize daily ingestion trends and identify which data sources are driving volume
  • Consider implementing data collection rules to filter unnecessary data before it reaches the workspace

After implementing the log-based alert with the correct query, you should receive reliable notifications. The key difference is that log-based alerts query actual ingestion data every evaluation cycle, while metric-based alerts depend on metric aggregation pipelines that can have delays or gaps in older workspace versions.

Check what metric you’re actually using in the alert rule. There’s a common confusion between ‘Usage’ and ‘Data Ingestion Volume’. For workspace capacity alerts, you need to use the ‘Data Ingestion Volume’ metric, not ‘Usage’. Also, make sure you’re using the correct aggregation type - it should be ‘Total’ not ‘Average’. The evaluation frequency and period matter too. If you’re checking every 5 minutes with a 5-minute period, you might miss spikes. Try a longer evaluation period like 15 or 30 minutes.

Actually, there’s a better approach. Instead of monitoring Data Ingestion Volume directly, use the ‘Usage’ table in Log Analytics itself to create log-based alerts. This gives you much more flexibility and accuracy. You can query the Usage table to calculate percentage of daily cap and trigger alerts based on that. The metric-based alerts for workspace capacity have limitations and can be unreliable, especially in the older 2019 workspace version. Log-based alerts are more reliable because they query actual ingestion data rather than relying on metric aggregation. Here’s a query to start with: search the Usage table for _IsBillable records and sum the Quantity field, then compare against your daily cap. Set this to run every 15 minutes.

I just checked and yes, I’m using ‘Usage’ as the metric. Should I be looking at the workspace’s daily cap setting? Our cap is set to 10 GB/day. So if I switch to ‘Data Ingestion Volume’, what threshold value should I use? 9 GB to alert at 90%?

One thing to watch out for - if your workspace is in the legacy pricing tier (per-GB), the daily cap behavior is different from the newer Pay-As-You-Go tier. In legacy tiers, when you hit the cap, data ingestion stops but you might not get clear metrics about it. Also, check if your alert rule has any action group throttling configured. By default, action groups have throttling to prevent alert storms - email notifications won’t repeat for the same alert within 24 hours. If you’re hitting the cap daily, you might only get one alert and then subsequent triggers are suppressed.