Log based alerts

A Log-based alert allows you to define conditions based on log data, triggering alerts when these conditions are met. Here's a breakdown of the various sections and options available when configuring a Log-based alert:

Step 1: Define the Log Metric

In this step, you use the Logs Query Builder to apply filters and operations on your logs to define conditions which triggers log based alert Some of the fields that are available in Logs Query Builder includes:

  • Logs: A field to filter the specific log data to monitor.

  • Aggregate Attribute: Allows you to select how the log data should be aggregated (e.g., "Count").

  • Group by: Provides options to group log data by various attributes, such as "service.name", "method" or custom attributes.

  • Legend Format: Lets you define the format for the legend in the visual representation of the alert.

  • Having: Apply conditions to filter the results further based on aggregate value.

Using Query Builder to perform operations on your logs
Using Query Builder to perform operations on your logs

Step 2: Define Alert Conditions

In this step, you define the specific conditions for triggering the alert, as well as the frequency of checking those conditions. The condition configuration of an alert in SigNoz consists of 5 core parts:

Query

An alert can consist of multiple queries and formulas. But only 1 of them can be put into consideration while determining the alert condition.

You can define one or more queries or formulas to fetch the data you want to evaluate. However, only one of them can be used as the trigger for the alert condition.

For example:

  • A = Total request count
  • B = Total error count
  • C = B / A (Error rate)

You can use query C as the evaluation target to trigger alerts based on error rate.

Condition

This defines the logical condition to check against the selected query’s value.

OperatorDescriptionExample Usage
AboveTriggers if the value is greater thanCPU usage Above 90 (%)
BelowTriggers if the value is less thanApdex score Below 0.8
Equal toTriggers if the value is exactly equalRequest count Equal to 0
Not equal toTriggers if the value is not equalInstance status Not Equal to 1

Match Type

Specifies how the condition must hold over the evaluation window. This allows for flexible evaluation logic.

Match TypeDescriptionExample Use Case
at least onceTrigger if condition matches even once in the windowDetect spikes or brief failures
all the timesTrigger only if condition matches at all points in the windowEnsure stable violations before alerting
on averageEvaluate the average value in the windowAverage latency Above 500ms
in totalEvaluate the total sum over the windowTotal errors Above 100
lastOnly the last data point is evaluatedUsed when only latest status matters

Evaluation Window (For)

Specifies how long the condition must be true before the alert is triggered.

e.g. For 5 minutes = The condition must remain true continuously for 5 minutes before the alert is triggered.

This helps reduce false positives due to short-lived spikes.

Threshold

This is the value you are comparing the query result against.

e.g. If you choose Condition = Above and set Threshold = 500, the alert will fire when the query result exceeds 500.

Threshold Unit

Specifies the unit of the threshold, such as:

  • ms (milliseconds) for latency
  • % for CPU usage
  • Count for request totals

Helps interpret the threshold in the correct context and also for correct scaling while comparing 2 values.

Advanced Options

In addition, there are 3 more advanced options:

Alert Frequency

  • How frequently SigNoz evaluates the alert condition.
  • Default is 1 min
  • e.g. If set to 1 min the alert will run once every minute.

Notification for missing data points

  • Triggers an alert if no data is received for the configured time period.
  • Useful for services where consistent data is expected.
  • E.g. If set to 5 minutes, and no metric data is received during that period, the alert will fire.

Minimum Data Points in Result Group

  • Ensures the alert condition is evaluated only when there's enough data for statistical significance.
  • Helps avoid false alerts due to missing or sparse data points.
  • E.g. If set to 3, the query must return at least 3 data points in the evaluation window for the alert to be considered.

Step 3: Alert Configuration

In this step, you set the alert's metadata, including severity, name, and description:

Severity

Set the severity level for the alert (e.g., "Warning" or "Critical").

Alert Name

A field to name the alert for easy identification.

Alert Description

Add a detailed description for the alert, explaining its purpose and trigger conditions.

You can incorporate result attributes in the alert descriptions to make the alerts more informative:

Syntax: Use $<attribute-name> to insert attribute values. Attribute values can be any attribute used in group by.

Example: If you have a query that has the attribute service.name in the group by clause then to use it in the alert description, you will use $service.name.

Slack alert format

Using advanced slack formatting is supported if you are using Slack as a notification channel.

Labels

A field to add static labels or tags for categorization. Labels should be added in key value pairs. First enter key (avoid space in key) and set value.

Notification channels

A field to choose the notification channels from those configured in the Alert Channel settings.

Test Notification

A button to test the alert to ensure that it works as expected.

Configure the alert
Setting the alert metadata

Examples

1. Alert when percentage of redis timeout error logs greater than 7% in last 5 mins

Here's a video tutorial for creating this alert:


Step 1: Write Query Builder query to define alert metric

logs builder query for redis timeout logs percentage
Redis timeout query

Here we write 2 queries to calculate error logs percent. First query to count logs which are redis timeout error logs. Second query to count total logs. Then we add a formula to calculate percentage.

error logs percentage chart
Error log percentage chart
Info

Remember to select y-axis unit as Percent(0-100) as we want to apply threshold in percent.


Step 2: Set alert conditions

redis timeout alert condition
Error logs percentage alert condition

The condition is set to trigger a notification if the per-minute error logs percentage exceeds the threshold of 1 second on average in the last five minutes.

Step 3: Set alert configuration

redis timeout alert configuration
Error logs percentage alert configuration

At last configure the alert as Warning, add a name and notification channel.

Last updated: June 6, 2024

Edit on GitHub

Was this page helpful?