Kubernetes has revolutionized container orchestration, but managing resources like CPU usage in a dynamic environment can be challenging. This guide shows you how to calculate and monitor container CPU usage in Kubernetes using Prometheus.
Understanding Kubernetes CPU Usage and Monitoring
CPU usage in Kubernetes represents the amount of computational power containers consume. Monitoring this usage is crucial for maintaining optimal performance, preventing resource bottlenecks, and ensuring efficient resource allocation.
Prometheus, an open-source monitoring system, plays a vital role in the Kubernetes ecosystem. It collects and stores time-series data, making it ideal for tracking CPU metrics. Key CPU usage metrics in Kubernetes include:
- container_cpu_usage_seconds_total
- node_cpu_utilization
- pod_cpu_utilization
These metrics provide insights into CPU consumption at various levels: containers, pods, and nodes.
CPU Metrics in Kubernetes
The container_cpu_usage_seconds_total
metric is fundamental for calculating CPU usage. It represents the cumulative CPU time consumed by a container since it started.
Kubernetes expresses CPU resources in cores. A single core is equivalent to:
- 1 AWS vCPU
- 1 GCP Core
- 1 Azure vCore
- 1 Hyperthread on a bare-metal processor
CPU requests and limits differ from actual usage:
- Requests: The guaranteed CPU resources for a container
- Limits: The maximum CPU a container can use
- Actual usage: The real-time CPU consumption
Monitoring at container, pod, and cluster levels provides a comprehensive view of resource utilization and helps identify potential issues.
Setting Up Prometheus for Kubernetes CPU Monitoring
To set up Prometheus for Kubernetes CPU monitoring:
Deploy Prometheus in your Kubernetes cluster:
kubectl apply -f <https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml>
Configure Prometheus to scrape Kubernetes CPU metrics:
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: prometheus namespace: monitoring spec: serviceAccountName: prometheus serviceMonitorSelector: matchLabels: team: frontend ruleSelector: matchLabels: team: frontend resources: requests: memory: 400Mi
Apply the configuration:
kubectl apply -f prometheus-config.yaml
Best practices for Prometheus deployment in production:
- Use persistent storage for long-term data retention
- Implement proper access controls and authentication
- Set up alerting for critical metrics
- Regularly update Prometheus to the latest stable version
Calculating Container CPU Usage with Prometheus
To calculate CPU usage, use the rate()
function with the container_cpu_usage_seconds_total
metric. The formula is:
sum(rate(container_cpu_usage_seconds_total{}[1m]))
This calculates the per-second rate of CPU usage over the last minute. To aggregate CPU usage across multiple containers or pods, you can use labels:
sum(rate(container_cpu_usage_seconds_total{pod=~"app-.*"}[1m])) by (pod)
To convert raw CPU usage to a percentage of available resources:
(sum(rate(container_cpu_usage_seconds_total{}[1m])) / sum(machine_cpu_cores)) * 100
PromQL Queries for CPU Usage Insights
Here are some useful PromQL queries for CPU usage insights:
Overall cluster CPU utilization:
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum(machine_cpu_cores) * 100
Top CPU-consuming pods:
topk(10, sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod))
CPU usage vs. requests:
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores) by (pod)
Detecting CPU throttling:
sum(rate(container_cpu_cfs_throttled_seconds_total{}[5m])) by (pod) > 0
Visualizing CPU Usage with Grafana
To visualize CPU usage with Grafana:
- Integrate Grafana with Prometheus:
- Add Prometheus as a data source in Grafana
- Configure the Prometheus server URL
- Create CPU usage dashboards:
- Use the queries mentioned above
- Add time series graphs, gauges, and tables
- Set up alerts for CPU usage thresholds:
- Define alert rules based on CPU utilization percentages
- Configure notification channels (email, Slack, etc.)
Visualizing CPU Usage with SigNoz
To visualize CPU usage with SigNoz:
SetUp SigNoz
SigNoz cloud is the easiest way to run SigNoz. Sign up for a free account and get 30 days of unlimited access to all features. You can also install and self-host SigNoz yourself since it is open-source. With 18,000+ GitHub stars, open-source SigNoz is loved by developers. Find the instructions to self-host SigNoz.
Integrate SigNoz with your Kubernetes cluster:
- Deploy SigNoz using Helm or Kubernetes manifests
- Configure SigNoz to scrape Prometheus metrics from your cluster
Create CPU usage dashboards:
- Use the PromQL queries mentioned earlier
- Add various visualization types such as time series graphs, gauges, and tables
- Leverage SigNoz's built-in Kubernetes dashboards for quick insights
Set up alerts for CPU usage thresholds:
- Define alert rules based on CPU utilization percentages
- Configure notification channels (email, Slack, PagerDuty, etc.)
SigNoz offers several advantages for Kubernetes CPU monitoring:
- Native support for Kubernetes environments
- Unified platform for metrics, traces, and logs
- Custom dashboard creation.
- Advanced querying capabilities using PromQL
- Built-in alerting features.
Optimizing Kubernetes Deployments Based on CPU Metrics
Analyze CPU usage patterns to right-size container resources:
- If actual usage consistently exceeds requests, increase the request value
- If usage is significantly below limits, consider lowering them
Implement Horizontal Pod Autoscaling (HPA) based on CPU metrics:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 50
To identify and resolve CPU bottlenecks:
- Look for consistently high CPU usage or frequent throttling
- Consider optimizing application code or increasing resources
Balance CPU usage across nodes by:
- Using node affinity rules to distribute workloads
- Implementing cluster autoscaler to add nodes during high demand
Key Takeaways
- Prometheus effectively monitors Kubernetes CPU usage
- Understanding Kubernetes metrics is crucial for accurate CPU usage calculation
- PromQL queries enable detailed analysis of CPU utilization patterns
- Visualizing CPU metrics with Grafana aids in resource optimization
- Regular monitoring and analysis of CPU usage optimize Kubernetes performance
FAQs
What is the difference between CPU requests and limits in Kubernetes?
CPU requests guarantee a minimum amount of CPU resources, while limits cap the maximum CPU a container can use. Requests help with scheduling, ensuring nodes have enough resources. Limits prevent containers from consuming excessive CPU.
How often should I collect CPU metrics in a Kubernetes environment?
Collect CPU metrics every 15-30 seconds for real-time monitoring. For long-term analysis, aggregate data over longer intervals (e.g., 5 minutes) to reduce storage requirements while maintaining useful insights.
Can Prometheus monitor CPU usage of individual containers within a pod?
Yes, Prometheus can monitor CPU usage of individual containers within a pod. Use the container
label in your queries to differentiate between containers in the same pod.
How do I handle CPU usage spikes in my Kubernetes cluster?
To handle CPU usage spikes:
- Set up alerts for sudden increases in CPU usage
- Implement Horizontal Pod Autoscaling to automatically scale during high load
- Use Vertical Pod Autoscaling to adjust resource requests and limits
- Analyze the cause of spikes and optimize application code if necessary
Enhance Your Monitoring with SigNoz
While Prometheus offers powerful monitoring capabilities, managing retention and scaling can become challenging as your infrastructure grows. SigNoz provides a comprehensive monitoring solution that builds upon Prometheus' strengths while addressing its limitations.
SigNoz cloud is the easiest way to run SigNoz. Sign up for a free account and get 30 days of unlimited access to all features. You can also install and self-host SigNoz yourself since it is open-source. With 18,000+ GitHub stars, open-source SigNoz is loved by developers. Find the instructions to self-host SigNoz.
With SigNoz, you can:
- Scale your monitoring infrastructure effortlessly
- Access advanced querying and visualization capabilities
- Benefit from integrated tracing and logging alongside metrics.
- Get high performance with the clickhouse database
- Take advantage of SigNoz's exceptional exception monitoring capabilities
Resources
- How to Monitor Kubernetes Clusters using Prometheus
- Kubernetes Monitoring - 8 Best Practices for Effective Cluster Monitoring