How to Monitor Kubernetes CPU Usage with Prometheus - A Guide

Q: How do I handle CPU usage spikes in my Kubernetes cluster?

To handle CPU usage spikes: 1. Set up alerts for sudden increases in CPU usage 2. Implement Horizontal Pod Autoscaling to automatically scale during high load 3. Use Vertical Pod Autoscaling to adjust resource requests and limits 4. Analyze the cause of spikes and optimize application code if necessary.

Kubernetes has revolutionized container orchestration, but managing resources like CPU usage in a dynamic environment can be challenging. This guide shows you how to calculate and monitor container CPU usage in Kubernetes using Prometheus.

Understanding Kubernetes CPU Usage and Monitoring

CPU usage in Kubernetes represents the amount of computational power containers consume. Monitoring this usage is crucial for maintaining optimal performance, preventing resource bottlenecks, and ensuring efficient resource allocation.

Prometheus, an open-source monitoring system, plays a vital role in the Kubernetes ecosystem. It collects and stores time-series data, making it ideal for tracking CPU metrics. Key CPU usage metrics in Kubernetes include:

container_cpu_usage_seconds_total
node_cpu_utilization
pod_cpu_utilization

These metrics provide insights into CPU consumption at various levels: containers, pods, and nodes.

CPU Metrics in Kubernetes

The container_cpu_usage_seconds_total metric is fundamental for calculating CPU usage. It represents the cumulative CPU time consumed by a container since it started.

Kubernetes expresses CPU resources in cores. A single core is equivalent to:

1 AWS vCPU
1 GCP Core
1 Azure vCore
1 Hyperthread on a bare-metal processor

CPU requests and limits differ from actual usage:

Requests: The guaranteed CPU resources for a container
Limits: The maximum CPU a container can use
Actual usage: The real-time CPU consumption

Monitoring at container, pod, and cluster levels provides a comprehensive view of resource utilization and helps identify potential issues.

Setting Up Prometheus for Kubernetes CPU Monitoring

To set up Prometheus for Kubernetes CPU monitoring:

Deploy Prometheus in your Kubernetes cluster:

kubectl apply -f <https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml>

Configure Prometheus to scrape Kubernetes CPU metrics:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  ruleSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi

Apply the configuration:

kubectl apply -f prometheus-config.yaml

Best practices for Prometheus deployment in production:

Use persistent storage for long-term data retention
Implement proper access controls and authentication
Set up alerting for critical metrics
Regularly update Prometheus to the latest stable version

Calculating Container CPU Usage with Prometheus

To calculate CPU usage, use the rate() function with the container_cpu_usage_seconds_total metric. The formula is:

sum(rate(container_cpu_usage_seconds_total{}[1m]))

This calculates the per-second rate of CPU usage over the last minute. To aggregate CPU usage across multiple containers or pods, you can use labels:

sum(rate(container_cpu_usage_seconds_total{pod=~"app-.*"}[1m])) by (pod)

To convert raw CPU usage to a percentage of available resources:

(sum(rate(container_cpu_usage_seconds_total{}[1m])) / sum(machine_cpu_cores)) * 100

PromQL Queries for CPU Usage Insights

Here are some useful PromQL queries for CPU usage insights:

Overall cluster CPU utilization:

sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum(machine_cpu_cores) * 100

Top CPU-consuming pods:

topk(10, sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod))

CPU usage vs. requests:

sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores) by (pod)

Detecting CPU throttling:

sum(rate(container_cpu_cfs_throttled_seconds_total{}[5m])) by (pod) > 0

Visualizing CPU Usage with Grafana

To visualize CPU usage with Grafana:

Integrate Grafana with Prometheus:
- Add Prometheus as a data source in Grafana
- Configure the Prometheus server URL
Create CPU usage dashboards:
- Use the queries mentioned above
- Add time series graphs, gauges, and tables
Set up alerts for CPU usage thresholds:
- Define alert rules based on CPU utilization percentages
- Configure notification channels (email, Slack, etc.)

Visualizing CPU Usage with SigNoz

To visualize CPU usage with SigNoz:

SetUp SigNoz
SigNoz cloud is the easiest way to run SigNoz. Sign up for a free account and get 30 days of unlimited access to all features.
You can also install and self-host SigNoz yourself since it is open-source. With 24,000+ GitHub stars, open-source SigNoz is loved by developers. Find the instructions to self-host SigNoz.
Integrate SigNoz with your Kubernetes cluster:
- Deploy SigNoz using Helm or Kubernetes manifests
- Configure SigNoz to scrape Prometheus metrics from your cluster
Create CPU usage dashboards:
- Use the PromQL queries mentioned earlier
- Add various visualization types such as time series graphs, gauges, and tables
- Leverage SigNoz's built-in Kubernetes dashboards for quick insights
Set up alerts for CPU usage thresholds:
- Define alert rules based on CPU utilization percentages
- Configure notification channels (email, Slack, PagerDuty, etc.)

SigNoz offers several advantages for Kubernetes CPU monitoring:

Native support for Kubernetes environments
Unified platform for metrics, traces, and logs
Custom dashboard creation.
Advanced querying capabilities using PromQL
Built-in alerting features.

Optimizing Kubernetes Deployments Based on CPU Metrics

Analyze CPU usage patterns to right-size container resources:

If actual usage consistently exceeds requests, increase the request value
If usage is significantly below limits, consider lowering them

Implement Horizontal Pod Autoscaling (HPA) based on CPU metrics:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 50

To identify and resolve CPU bottlenecks:

Look for consistently high CPU usage or frequent throttling
Consider optimizing application code or increasing resources

Balance CPU usage across nodes by:

Using node affinity rules to distribute workloads
Implementing cluster autoscaler to add nodes during high demand

Key Takeaways

Prometheus effectively monitors Kubernetes CPU usage
Understanding Kubernetes metrics is crucial for accurate CPU usage calculation
PromQL queries enable detailed analysis of CPU utilization patterns
Visualizing CPU metrics with Grafana aids in resource optimization
Regular monitoring and analysis of CPU usage optimize Kubernetes performance

FAQs

What is the difference between CPU requests and limits in Kubernetes?

CPU requests guarantee a minimum amount of CPU resources, while limits cap the maximum CPU a container can use. Requests help with scheduling, ensuring nodes have enough resources. Limits prevent containers from consuming excessive CPU.

How often should I collect CPU metrics in a Kubernetes environment?

Collect CPU metrics every 15-30 seconds for real-time monitoring. For long-term analysis, aggregate data over longer intervals (e.g., 5 minutes) to reduce storage requirements while maintaining useful insights.

Can Prometheus monitor CPU usage of individual containers within a pod?

Yes, Prometheus can monitor CPU usage of individual containers within a pod. Use the container label in your queries to differentiate between containers in the same pod.

How do I handle CPU usage spikes in my Kubernetes cluster?

To handle CPU usage spikes:

Set up alerts for sudden increases in CPU usage
Implement Horizontal Pod Autoscaling to automatically scale during high load
Use Vertical Pod Autoscaling to adjust resource requests and limits
Analyze the cause of spikes and optimize application code if necessary

Enhance Your Monitoring with SigNoz

While Prometheus offers powerful monitoring capabilities, managing retention and scaling can become challenging as your infrastructure grows. SigNoz provides a comprehensive monitoring solution that builds upon Prometheus' strengths while addressing its limitations.

SigNoz cloud is the easiest way to run SigNoz. Sign up for a free account and get 30 days of unlimited access to all features.

You can also install and self-host SigNoz yourself since it is open-source. With 24,000+ GitHub stars, open-source SigNoz is loved by developers. Find the instructions to self-host SigNoz.

With SigNoz, you can:

Scale your monitoring infrastructure effortlessly
Access advanced querying and visualization capabilities
Benefit from integrated tracing and logging alongside metrics.
Get high performance with the clickhouse database
Take advantage of SigNoz's exceptional exception monitoring capabilities

How to Monitor Kubernetes CPU Usage with Prometheus - A Guide

Author:

Understanding Kubernetes CPU Usage and Monitoring

CPU Metrics in Kubernetes

Setting Up Prometheus for Kubernetes CPU Monitoring

Calculating Container CPU Usage with Prometheus

PromQL Queries for CPU Usage Insights

Visualizing CPU Usage with Grafana

Visualizing CPU Usage with SigNoz

Optimizing Kubernetes Deployments Based on CPU Metrics

Key Takeaways

FAQs

What is the difference between CPU requests and limits in Kubernetes?

How often should I collect CPU metrics in a Kubernetes environment?

Can Prometheus monitor CPU usage of individual containers within a pod?

How do I handle CPU usage spikes in my Kubernetes cluster?

Enhance Your Monitoring with SigNoz

Resources

Was this page helpful?

On this page

Author

Related Articles

What are the Prometheus Queries to Monitoring Kubernetes Pod CPU and Memory

How to Monitor Custom Kubernetes Pod Metrics with Prometheus

Save up to 45% on your Grafana bill