Skip to main content

Table schema definitions & examples for metrics

There are two tables in the database. One is for storing the samples/metrics and the other is for storing the time series data.

Schema for samples table:

Note: the schemas are not final. We might change it in the future.

CREATE TABLE signoz_metrics.distributed_samples_v2
(
`metric_name` LowCardinality(String),
`fingerprint` UInt64 CODEC(DoubleDelta, LZ4),
`timestamp_ms` Int64 CODEC(DoubleDelta, LZ4),
`value` Float64 CODEC(Gorilla, LZ4)
)
ENGINE = Distributed('cluster', 'signoz_metrics', 'samples_v2', cityHash64(metric_name, fingerprint))

Explanation of the columns:

  • metric_name: Name of the metric
  • fingerprint: Fingerprint of the metric. This is used to identify the metric uniquely. Currently, we are using the hash of the labels to generate the fingerprint.
  • timestamp_ms: Timestamp of the metric in milliseconds
  • value: Value of the metric

Example of a samples

┌─metric_name────────┬────────fingerprint─┬──timestamp_ms─┬─value─┐
│ signoz_calls_total │ 844117381117785689 │ 1698611461282 │ 25085 │
│ signoz_calls_total │ 844117381117785689 │ 1698611521282 │ 25140 │
│ signoz_calls_total │ 844117381117785689 │ 1698611581282 │ 25192 │
│ signoz_calls_total │ 844117381117785689 │ 1698611641282 │ 25248 │
│ signoz_calls_total │ 844117381117785689 │ 1698611701282 │ 25305 │
│ signoz_calls_total │ 844117381117785689 │ 1698611761282 │ 25361 │
│ signoz_calls_total │ 844117381117785689 │ 1698611821282 │ 25421 │
│ signoz_calls_total │ 844117381117785689 │ 1698611881282 │ 25475 │
│ signoz_calls_total │ 844117381117785689 │ 1698611941282 │ 25534 │
│ signoz_calls_total │ 844117381117785689 │ 1698612001282 │ 25588 │
└────────────────────┴────────────────────┴───────────────┴───────┘

Schema for time series table:

CREATE TABLE signoz_metrics.distributed_time_series_v2
(
`metric_name` LowCardinality(String),
`fingerprint` UInt64 CODEC(DoubleDelta, LZ4),
`timestamp_ms` Int64 CODEC(DoubleDelta, LZ4),
`labels` String CODEC(ZSTD(5)),
`temporality` LowCardinality(String) DEFAULT 'Unspecified' CODEC(ZSTD(5))
)
ENGINE = Distributed('cluster', 'signoz_metrics', 'time_series_v2', cityHash64(metric_name, fingerprint))

Explanation of the columns:

  • metric_name: Name of the metric
  • fingerprint: Fingerprint of the metric. This is used to identify the metric uniquely. Currently, we are using the hash of the labels to generate the fingerprint.
  • timestamp_ms: Timestamp of the metric when it was observed for the first time in milliseconds
  • labels: Labels of the metric; Stored as a JSON string
  • temporality: Temporality of the metric. This is used to identify the type of the metric. It can be one of the following values:
    • Unspecified: This is the default value.
    • Cumulative: This is used for monotonic counters.
    • Delta: This is used for non-monotonic counters.

Example of a time series

┌─metric_name────────┬────────fingerprint─┬──timestamp_ms─┬─labels─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─temporality─┐
│ signoz_calls_total │ 844117381117785689 │ 1698584462214 │ {"__name__":"signoz_calls_total","__temporality__":"Cumulative","deployment_environment":"default","operation":"/driver.DriverService/FindNearest","resource_signoz_collector_id":"1e183921-3de2-4afe-9729-78db8f2b65b5","service_name":"driver","service_namespace":"default","signoz_collector_id":"1e183921-3de2-4afe-9729-78db8f2b65b5","span_kind":"SPAN_KIND_SERVER","status_code":"STATUS_CODE_UNSET"} │ Cumulative │
└────────────────────┴────────────────────┴───────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┘

Querying the metrics

Querying the metrics is done in two steps. First, we query the time series table to get the fingerprints and labels of the metrics. Then, we use the fingerprints to query the samples table to get the actual samples. This is done to reduce the amount of data that needs to be scanned from the samples table. Also, the time series table is much smaller than the samples table.

We use the JOIN operation to join the two tables. The JOIN operation is done on the metric_name and fingerprint columns. The JOIN operation is done on the cluster level. This means that the JOIN operation is done on the data nodes and not on the aggregator nodes. This is done to reduce the amount of data that needs to be transferred between the nodes. This works because the data is sharded based on the metric_name and fingerprint columns hence all the data for a particular metric and fingerprint will be present on the same node.

Note: The following queries use GROUPING SETS to get the data for the overall metric. This is used when the limit is applied to the query. Please refer to the ClickHouse documentation for more information on GROUPING SETS.

Note: The queries should have a result column with the name value and a column with type DateTime for the graphs to work.

Example queries

Retrieving the fingerprints and labels of the metric

SELECT
fingerprint,
labels
FROM signoz_metrics.distributed_time_series_v2
WHERE (metric_name = 'signoz_calls_total') AND (temporality = 'Cumulative')
LIMIT 10

Retrieving the samples of the metric

SELECT
timestamp_ms,
value
FROM signoz_metrics.distributed_samples_v2
WHERE metric_name = 'signoz_calls_total'
LIMIT 10

Retrieving the label values of a metric

SELECT DISTINCT
fingerprint,
JSONExtractString(labels, 'service_name') AS service_name
FROM signoz_metrics.distributed_time_series_v2
WHERE (metric_name = 'signoz_calls_total') AND (temporality = 'Cumulative')
LIMIT 10

Example queries for the frontend service RED metrics

Note: The following queries can be modified to get the metrics for each service by adding grouping by the service name.

The following queries are used to generate the graphs for the frontend service RED metrics. The innermost join is used to get the raw samples joined with the time series table. Then, we use the runningDifference function to get the rate of change of the samples. Then, we use the outermost query to get the sum of the rate of change of the samples.

Request rate for a service

SELECT
ts,
sum(value) AS value
FROM
(
SELECT
ts,
if(runningDifference(ts) <= 0, nan, if(runningDifference(value) < 0, value / runningDifference(ts), runningDifference(value) / runningDifference(ts))) AS value
FROM
(
SELECT
fingerprint,
toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
max(value) AS value
FROM signoz_metrics.distributed_samples_v2
INNER JOIN
(
SELECT fingerprint
FROM signoz_metrics.time_series_v2
WHERE (metric_name = 'signoz_latency_count') AND (temporality IN ['Cumulative', 'Unspecified']) AND (JSONExtractString(labels, 'service_name') IN ['frontend']) AND (JSONExtractString(labels, 'operation') IN ['HTTP GET /dispatch'])
) AS filtered_time_series USING (fingerprint)
WHERE (metric_name = 'signoz_latency_count') AND (timestamp_ms >= {{.start_timestamp_ms}}) AND (timestamp_ms <= {{.end_timestamp_ms}})
GROUP BY
fingerprint,
ts
ORDER BY
fingerprint ASC,
ts ASC
)
WHERE isNaN(value) = 0
)
GROUP BY
GROUPING SETS (
(ts),
())
ORDER BY ts ASC

Error rate for a service

This is the query for the error rate for a service. The query is similar to the request rate query. The only difference is that we are filtering the samples based on the status code. Then, we divide the error samples by the total samples to get the error rate.

SELECT
A.ts AS ts,
(A.value * 100) / B.value AS value
FROM
(
SELECT
ts,
sum(value) AS value
FROM
(
SELECT
ts,
if(runningDifference(ts) <= 0, nan, if(runningDifference(value) < 0, value / runningDifference(ts), runningDifference(value) / runningDifference(ts))) AS value
FROM
(
SELECT
fingerprint,
toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
max(value) AS value
FROM signoz_metrics.distributed_samples_v2
INNER JOIN
(
SELECT fingerprint
FROM signoz_metrics.time_series_v2
WHERE (metric_name = 'signoz_calls_total') AND (temporality IN ['Cumulative', 'Unspecified']) AND (JSONExtractString(labels, 'service_name') IN ['redis']) AND (JSONExtractString(labels, 'status_code') IN ['STATUS_CODE_ERROR'])
) AS filtered_time_series USING (fingerprint)
WHERE (metric_name = 'signoz_calls_total') AND (timestamp_ms >= {{.start_timestamp_ms}}) AND (timestamp_ms <= {{.end_timestamp_ms}})
GROUP BY
fingerprint,
ts
ORDER BY
fingerprint ASC,
ts ASC
)
WHERE isNaN(value) = 0
)
GROUP BY
GROUPING SETS (
(ts),
())
ORDER BY ts ASC
) AS A
INNER JOIN
(
SELECT
ts,
sum(value) AS value
FROM
(
SELECT
ts,
if(runningDifference(ts) <= 0, nan, if(runningDifference(value) < 0, value / runningDifference(ts), runningDifference(value) / runningDifference(ts))) AS value
FROM
(
SELECT
fingerprint,
toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
max(value) AS value
FROM signoz_metrics.distributed_samples_v2
INNER JOIN
(
SELECT fingerprint
FROM signoz_metrics.time_series_v2
WHERE (metric_name = 'signoz_calls_total') AND (temporality IN ['Cumulative', 'Unspecified']) AND (JSONExtractString(labels, 'service_name') IN ['redis'])
) AS filtered_time_series USING (fingerprint)
WHERE (metric_name = 'signoz_calls_total') AND (timestamp_ms >= {{.start_timestamp_ms}}) AND (timestamp_ms <= {{.end_timestamp_ms}})
GROUP BY
fingerprint,
ts
ORDER BY
fingerprint ASC,
ts ASC
)
WHERE isNaN(value) = 0
)
GROUP BY
GROUPING SETS (
(ts),
())
ORDER BY ts ASC
) AS B ON A.ts = B.ts

99th percentile latency for a service

This is the query for the 99th percentile latency for a service. The query is similar to the request rate query. We are using the histogramQuantile function to get the 99th percentile latency. The histogramQuantile function is used to get the quantile value from a histogram. The histogram is represented as two arrays. One array contains the buckets and the other array contains the values. The histogramQuantile function takes the two arrays and the quantile value as input and returns the quantile value. The bucket bounds and values are obtained by taking the rate of change for each bucket.

SELECT
ts,
histogramQuantile(arrayMap(x -> toFloat64(x), groupArray(le)), groupArray(value), 0.99) AS value
FROM
(
SELECT
le,
ts,
sum(value) AS value
FROM
(
SELECT
le,
ts,
if(runningDifference(ts) <= 0, nan, if(runningDifference(value) < 0, value / runningDifference(ts), runningDifference(value) / runningDifference(ts))) AS value
FROM
(
SELECT
fingerprint,
le,
toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
max(value) AS value
FROM signoz_metrics.distributed_samples_v2
INNER JOIN
(
SELECT
JSONExtractString(labels, 'le') AS le,
fingerprint
FROM signoz_metrics.time_series_v2
WHERE (metric_name = 'signoz_latency_bucket') AND (temporality IN ['Cumulative', 'Unspecified']) AND (JSONExtractString(labels, 'service_name') = 'frontend')
) AS filtered_time_series USING (fingerprint)
WHERE (metric_name = 'signoz_latency_bucket') AND (timestamp_ms >= {{.start_timestamp_ms}}) AND (timestamp_ms <= {{.end_timestamp_ms}})
GROUP BY
fingerprint,
le,
ts
ORDER BY
fingerprint ASC,
le ASC,
ts ASC
)
WHERE isNaN(value) = 0
)
GROUP BY
GROUPING SETS (
(le, ts),
(le))
HAVING isNaN(value) = 0
ORDER BY
le ASC,
ts ASC
)
GROUP BY ts
ORDER BY ts ASC

Using variables in queries

SigNoz supports using variables in queries. This allows you to create a single dashboard for multiple services. For example, you can create a dashboard for the request rate for all the services. Then, you can use the service name as a variable in the query to get the request rate for a particular service. Please refer to the Variables section for more information on how to create variables.

Example queries using variables

Request rate for a service

The variable {{.service_name}} is used to get the service name from the variable. The variable is replaced with the service name when the query is executed.

SELECT
ts,
sum(value) AS value
FROM
(
SELECT
ts,
if(runningDifference(ts) <= 0, nan, if(runningDifference(value) < 0, value / runningDifference(ts), runningDifference(value) / runningDifference(ts))) AS value
FROM
(
SELECT
fingerprint,
toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
max(value) AS value
FROM signoz_metrics.distributed_samples_v2
INNER JOIN
(
SELECT fingerprint
FROM signoz_metrics.time_series_v2
WHERE (metric_name = 'signoz_latency_count') AND (temporality IN ['Cumulative', 'Unspecified']) AND (JSONExtractString(labels, 'service_name') IN {{.service_name}}) AND (JSONExtractString(labels, 'operation') IN ['HTTP GET /dispatch'])
) AS filtered_time_series USING (fingerprint)
WHERE (metric_name = 'signoz_latency_count') AND (timestamp_ms >= {{.start_timestamp_ms}}) AND (timestamp_ms <= {{.end_timestamp_ms}})
GROUP BY
fingerprint,
ts
ORDER BY
fingerprint ASC,
ts ASC
)
WHERE isNaN(value) = 0
)
GROUP BY
GROUPING SETS (
(ts),
())
ORDER BY ts ASC

Using the default variables

The following variables are available by default:

  • {{.start_timestamp_ms}} - This is the start time of the query in milliseconds
  • {{.end_timestamp_ms}} - This is the end time of the query in milliseconds