SigNoz Cloud - This page is relevant for SigNoz Cloud editions.
Self-Host - This page is relevant for self-hosted SigNoz editions.

Time Aggregation Best Practices

Time aggregation determines how multiple data points within a collection/aggregation interval are combined before evaluation.

Common Time Aggregation Methods

AggregationUse CaseExample
MaxPeak values, worst-case scenariosContainer restarts, max CPU spike
MinMinimum thresholds, availabilityMinimum available memory
AvgGeneral trends, smoothed metricsAverage response time
SumTotal counts, cumulative metricsTotal requests, error count
CountNumber of occurrencesEvent frequency
Count DistinctUnique valuesUnique users, distinct IPs
P50/P95/P99Latency percentilesResponse time distributions
RateChanges per time unitRequests per second, errors per minute
IncreaseAbsolute growth over time periodTotal value growth since previous measurement

To learn more about aggregations in metrics, visit the Metric types and aggregation.

Aggregation Examples

For Container Restarts:

Metric: k8s.container.restarts
Aggregation: max (use running_diff to compute increments)
Evaluation: at least once
Formula: running_diff(k8s.container.restarts, cutoff_min=0)
Reason: Restart counters are cumulative and don't reset to zero immediately after a restart, which can make alerts continuously fire. 
Compute the difference between consecutive samples and drop negative values (caused by counter resets) using cutoff_min=0.

For Memory Usage:

Metric: system.memory.usage
Aggregation: avg or max (depending on use case)
Formula: (used - cached) / total * 100
Reason: Exclude cached memory for accurate usage representation

For Latency Monitoring:

Metric: http.server.duration
Aggregation: P95 or P99
Evaluation: on average (not in total)
Reason: Percentiles shouldn't be summed; average P95 over time makes sense

For Throughput Analysis:

Metric: http.requests.total
Aggregation: sum
Evaluation: in total
Reason: Provides the total number of requests over the evaluation window, useful to monitor system load

For Error Rate Calculation:

Metric: system.errors
Aggregation: count or rate
Evaluation: at least once
Reason: Detects any occurrence of error spikes and evaluates system reliability

Detecting Continuous Uptrends in Derived Metrics

You can detect continuous uptrends in derived metrics by using the running_diff function on individual queries rather than on formulas. This is useful for scenarios like monitoring RabbitMQ message queue trends where you want to alert on sustained increases rather than temporary spikes.

Setting Up Uptrend Detection

To detect a continuous uptrend in a derived metric (like the rate difference between messages published and acknowledged), follow these steps:

  1. Create your base queries:

    • Query A: rate(messages_published) (every 120s)
    • Query B: rate(messages_acked) (every 120s)
  2. Apply running difference to each query:

    • Apply running_diff directly on Query A with 120-second interval
    • Apply running_diff directly on Query B with 120-second interval
  3. Create the formula:

    • Use formula: A - B to get your derived rate difference
  4. Set up the alert condition:

    • Alert when the result is > 0 to detect upward trends

The running_diff function calculates R(t) - R(t-previous), allowing you to detect when each consecutive reading is higher than the previous one, which is exactly what you need for sustained uptrend detection.

RabbitMQ sustained uptrend detection using running_diff on individual queries
Detecting sustained uptrends in RabbitMQ using running_diff on individual queries A and B

Last updated: February 24, 2026

Edit on GitHub
On this page

Is this page helpful?

Your response helps us improve this page.