Prometheus labels are powerful tools for adding context to metrics, but they come with certain limitations. These constraints primarily revolve around cardinality — the number of unique time series generated by label combinations. Understanding these limitations is crucial for maintaining a performant monitoring system. Let's explore the key aspects of Prometheus label limitations and learn how to work within these constraints effectively.
What are Prometheus Labels and Their Importance?
Prometheus labels are key-value pairs attached to metrics that provide additional context. They allow you to identify specific instances, jobs, or environments associated with a metric. Labels contribute to data dimensionality, enabling flexible querying and aggregation of metrics.
For example, a metric http_requests_total
might have labels like:
http_requests_total{method="GET", endpoint="/api/users", status="200"}
This labeling system offers several benefits:
- Granular filtering: You can query specific combinations of label values.
- Dynamic aggregation: Labels allow you to group and summarize data across different dimensions.
- Rich context: They provide detailed information about the origin and nature of each metric.
How Labels Enhance Prometheus Metrics
Labels transform simple metrics into multi-dimensional data points. They act as metadata, turning a basic counter or gauge into a rich source of information. Here's how labels enhance your metrics:
- Flexible querying: Labels enable complex queries using PromQL (Prometheus Query Language).
- Efficient alerting: You can create precise alert conditions based on label values.
- Improved visualization: Labels allow for dynamic dashboard creation with drill-down capabilities.
Best practices for naming and structuring labels include:
- Use lowercase letters, numbers, and underscores for label names.
- Choose descriptive yet concise label names.
- Ensure consistency in label naming across your metrics.
What are the Limitations of Prometheus Labels?
While labels are incredibly useful, they come with limitations — primarily related to cardinality. Cardinality refers to the number of unique time series created by different label value combinations. High cardinality can significantly impact Prometheus performance.
Key limitations include:
- Memory usage: Each unique combination of labels creates a new time series, consuming memory.
- Query performance: High cardinality can slow down query execution.
- Storage requirements: More time series mean more data to store and manage.
Prometheus recommends keeping the total number of time series under tens of millions for optimal performance. Exceeding this can lead to increased resource consumption and slower query responses.
Understanding Cardinality Limits
Cardinality limits exist in Prometheus due to its design as a single-node time series database. Each unique combination of metric name and label values creates a distinct time series, which Prometheus must track, store, and query.
High cardinality scenarios often arise from:
- Using highly specific labels (e.g., user IDs, session IDs)
- Combining multiple high-cardinality labels
- Frequent changes in label values
To estimate cardinality in your Prometheus deployment, use the following PromQL query:
sum(count by(__name__, job)({__name__=~".+"}))
This query gives you an idea of the total number of time series in your system.
How to Manage High Cardinality in Prometheus?
Managing high cardinality is crucial for maintaining Prometheus performance. Here are some strategies to reduce label cardinality:
- Metric relabeling: Use Prometheus' relabeling features to drop or modify high-cardinality labels before ingestion.
- Aggregation: Pre-aggregate metrics to reduce the number of unique time series.
- Bucketing: Use histograms or summaries to represent distributions efficiently.
Implementing Cardinality Control Measures
To identify high-cardinality metrics, use this PromQL query:
topk(10, count by (__name__, job)({__name__=~".+"}))
This query shows the top 10 metrics with the highest cardinality.
Once identified, you can optimize label usage:
- Remove unnecessary high-cardinality labels.
- Use label aggregation to reduce granularity where appropriate.
- Implement metric relabeling to modify or drop problematic labels.
Example of metric relabeling in Prometheus configuration:
scrape_configs:
- job_name: 'high_cardinality_job'
metric_relabel_configs:
- source_labels: [user_id]
regex: '.+'
action: drop
This configuration drops the user_id
label, which might be causing high cardinality.
Advanced Techniques for Scaling Prometheus
For environments with inherently high cardinality, consider these advanced scaling techniques:
- Horizontal scaling: Use Prometheus federation to distribute the load across multiple instances.
- Remote storage: Implement long-term storage solutions to offload data from Prometheus.
- Thanos or Cortex: These projects extend Prometheus to handle larger-scale deployments.
Key Takeaways
- Prometheus labels are powerful but come with cardinality limitations.
- Keep total time series under tens of millions for optimal performance.
- Regularly audit and optimize label usage to prevent cardinality explosion.
- Implement aggregation, relabeling, and proper metric design to manage cardinality.
FAQs
What is the maximum number of label values Prometheus can handle?
Prometheus doesn't have a hard limit on label values, but it's recommended to keep the total number of time series below tens of millions for optimal performance.
How do I identify high-cardinality metrics in my Prometheus setup?
Use the PromQL query provided earlier in this article to identify metrics with the highest cardinality. Regularly monitor these metrics to prevent performance issues.
Can I use customer names or unique IDs as label values?
It's generally not recommended to use highly specific identifiers like customer names or unique IDs as label values. This can lead to cardinality explosion. Instead, consider using broader categories or aggregating this data.
What are the alternatives to Prometheus for high-cardinality environments?
For environments with extremely high cardinality, consider time series databases designed for such workloads, like InfluxDB or TimescaleDB. Alternatively, you can use SigNoz, which provides advanced observability features while handling high-cardinality data efficiently.
SigNoz cloud is the easiest way to run SigNoz. Sign up for a free account and get 30 days of unlimited access to all features. You can also install and self-host SigNoz yourself since it is open-source. With 18,000+ GitHub stars, open-source SigNoz is loved by developers. Find the instructions to self-host SigNoz.
SigNoz offers a scalable solution for monitoring high-cardinality environments, providing advanced querying capabilities and efficient data management. It's compatible with Prometheus metrics and can serve as an excellent alternative or complement to your existing Prometheus setup.