Alerts Firing Without Visible Threshold Breach

Problem Description

Your alert fires but when checking dashboards or explorer views, the metric appears below the configured threshold. This guide helps diagnose why alerts trigger when they shouldn't appear to.

Common Root Causes

1. Evaluation Window Mismatch

Issue: Alert evaluates over different time window than your dashboard displays.

Solution:

# Alert configuration
evaluation_window: 5m    # Alert checks last 5 minutes
dashboard_view: 15m      # You're viewing last 15 minutes

# Fix: Match your dashboard time range to alert evaluation window

Verification Steps:

Navigate to Alerts → Edit Alert
Note the evaluation window (e.g., "for 5 minutes")
Set dashboard to exact same time range
Check if threshold breach becomes visible

2. Aggregation Method Conflicts

Issue: Using incompatible aggregation for your metric type.

Common Mistakes:

count_distinct on continuous values
p99 on sparse data (insufficient samples)

3. "At Least Once" vs "In Total" Evaluation

Issue: Alert condition evaluates differently than expected.

Behavior Differences:

At least once: Fires if ANY datapoint exceeds threshold
In total: Fires if aggregated value over entire window exceeds threshold

Example:

# CPU usage datapoints over 5-minute window: [70%, 75%, 80%, 85%, 70%]
threshold: 90%

at_least_once: DOES NOT FIRE (no single datapoint > 90%)
in_total: FIRES (sum: 70+75+80+85+70 = 380% > 90% threshold)

4. Time Synchronization Issues

Issue: Metrics arrive with incorrect timestamps.

Diagnosis:

# Check clock drift on all monitored hosts
for host in $(cat hosts.txt); do
  echo "Host: $host"
  ssh $host 'timedatectl show | grep NTPSynchronized'
done

Fix:

# Enable NTP synchronization
sudo timedatectl set-ntp true
sudo systemctl restart chronyd  # or ntpd

5. Missing Data Points & Sparse Metrics

Issue: Gaps in data cause unexpected evaluations.

SigNoz Behavior:

Missing data points are NOT interpolated by default
Sparse metrics may not have enough samples for percentile calculations

Solutions:

For sparse metrics: Use longer evaluation windows
For missing data: Configure "No Data" alerts separately
For percentiles: Ensure minimum 20 samples in evaluation window

6. Few data points/samples in evaluation window

Issue: Few data points/samples in the evaluation window can lead to inaccurate or unexpected alert evaluations.

Problem:

# Problematic configuration
scrape_interval: 120s
evaluation_window: 5m  # Only captures 2-3 samples

# Recommended
scrape_interval: 15s
evaluation_window: 5m  # Captures ~20 samples