Alerts Firing Without Visible Threshold Breach
Problem Description
Your alert fires but when checking dashboards or explorer views, the metric appears below the configured threshold. This guide helps diagnose why alerts trigger when they shouldn't appear to.
Common Root Causes
1. Evaluation Window Mismatch
Issue: Alert evaluates over different time window than your dashboard displays.
Solution:
# Alert configuration
evaluation_window: 5m # Alert checks last 5 minutes
dashboard_view: 15m # You're viewing last 15 minutes
# Fix: Match your dashboard time range to alert evaluation window
Verification Steps:
- Navigate to Alerts → Edit Alert
- Note the evaluation window (e.g., "for 5 minutes")
- Set dashboard to exact same time range
- Check if threshold breach becomes visible
2. Aggregation Method Conflicts
Issue: Using incompatible aggregation for your metric type.
Common Mistakes:
count_distinct
on continuous valuesp99
on sparse data (insufficient samples)
3. "At Least Once" vs "In Total" Evaluation
Issue: Alert condition evaluates differently than expected.
Behavior Differences:
- At least once: Fires if ANY datapoint exceeds threshold
- In total: Fires if aggregated value over entire window exceeds threshold
Example:
# CPU usage datapoints over 5-minute window: [70%, 75%, 80%, 85%, 70%]
threshold: 90%
at_least_once: DOES NOT FIRE (no single datapoint > 90%)
in_total: FIRES (sum: 70+75+80+85+70 = 380% > 90% threshold)
4. Time Synchronization Issues
Issue: Metrics arrive with incorrect timestamps.
Diagnosis:
# Check clock drift on all monitored hosts
for host in $(cat hosts.txt); do
echo "Host: $host"
ssh $host 'timedatectl show | grep NTPSynchronized'
done
Fix:
# Enable NTP synchronization
sudo timedatectl set-ntp true
sudo systemctl restart chronyd # or ntpd
5. Missing Data Points & Sparse Metrics
Issue: Gaps in data cause unexpected evaluations.
SigNoz Behavior:
- Missing data points are NOT interpolated by default
- Sparse metrics may not have enough samples for percentile calculations
Solutions:
- For sparse metrics: Use longer evaluation windows
- For missing data: Configure "No Data" alerts separately
- For percentiles: Ensure minimum 20 samples in evaluation window
6. Few data points/samples in evaluation window
Issue: Few data points/samples in the evaluation window can lead to inaccurate or unexpected alert evaluations.
Problem:
# Problematic configuration
scrape_interval: 120s
evaluation_window: 5m # Only captures 2-3 samples
# Recommended
scrape_interval: 15s
evaluation_window: 5m # Captures ~20 samples
Solutions:
- Increase evaluation window: Use windows that capture at least 10-20 data points
- Decrease collection interval: More frequent collection provides better data density. Note: This increases metric ingestion volume and potential billing costs.
- Rule of thumb: Evaluation window should be 5-10x the collection interval
Last updated: August 13, 2025
Edit on GitHub