Kubernetes has revolutionized container orchestration, but managing resources like CPU usage in a dynamic environment can be challenging. This guide shows you how to calculate and monitor container CPU usage in Kubernetes using Prometheus.

Understanding Kubernetes CPU Usage and Monitoring

CPU usage in Kubernetes represents the amount of computational power containers consume. Monitoring this usage is crucial for maintaining optimal performance, preventing resource bottlenecks, and ensuring efficient resource allocation.

Prometheus, an open-source monitoring system, plays a vital role in the Kubernetes ecosystem. It collects and stores time-series data, making it ideal for tracking CPU metrics. Key CPU usage metrics in Kubernetes include:

  • container_cpu_usage_seconds_total
  • node_cpu_utilization
  • pod_cpu_utilization

These metrics provide insights into CPU consumption at various levels: containers, pods, and nodes.

CPU Metrics in Kubernetes

The container_cpu_usage_seconds_total metric is fundamental for calculating CPU usage. It represents the cumulative CPU time consumed by a container since it started.

Kubernetes expresses CPU resources in cores. A single core is equivalent to:

  • 1 AWS vCPU
  • 1 GCP Core
  • 1 Azure vCore
  • 1 Hyperthread on a bare-metal processor

CPU requests and limits differ from actual usage:

  • Requests: The guaranteed CPU resources for a container
  • Limits: The maximum CPU a container can use
  • Actual usage: The real-time CPU consumption

Monitoring at container, pod, and cluster levels provides a comprehensive view of resource utilization and helps identify potential issues.

Setting Up Prometheus for Kubernetes CPU Monitoring

To set up Prometheus for Kubernetes CPU monitoring:

  1. Deploy Prometheus in your Kubernetes cluster:

    kubectl apply -f <https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml>
    
    
  2. Configure Prometheus to scrape Kubernetes CPU metrics:

    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      name: prometheus
      namespace: monitoring
    spec:
      serviceAccountName: prometheus
      serviceMonitorSelector:
        matchLabels:
          team: frontend
      ruleSelector:
        matchLabels:
          team: frontend
      resources:
        requests:
          memory: 400Mi
    
    
  3. Apply the configuration:

    kubectl apply -f prometheus-config.yaml
    
    

Best practices for Prometheus deployment in production:

  • Use persistent storage for long-term data retention
  • Implement proper access controls and authentication
  • Set up alerting for critical metrics
  • Regularly update Prometheus to the latest stable version

Calculating Container CPU Usage with Prometheus

To calculate CPU usage, use the rate() function with the container_cpu_usage_seconds_total metric. The formula is:

sum(rate(container_cpu_usage_seconds_total{}[1m]))

This calculates the per-second rate of CPU usage over the last minute. To aggregate CPU usage across multiple containers or pods, you can use labels:

sum(rate(container_cpu_usage_seconds_total{pod=~"app-.*"}[1m])) by (pod)

To convert raw CPU usage to a percentage of available resources:

(sum(rate(container_cpu_usage_seconds_total{}[1m])) / sum(machine_cpu_cores)) * 100

PromQL Queries for CPU Usage Insights

Here are some useful PromQL queries for CPU usage insights:

  1. Overall cluster CPU utilization:

    sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum(machine_cpu_cores) * 100
    
    
  2. Top CPU-consuming pods:

    topk(10, sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod))
    
    
  3. CPU usage vs. requests:

    sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores) by (pod)
    
    
  4. Detecting CPU throttling:

    sum(rate(container_cpu_cfs_throttled_seconds_total{}[5m])) by (pod) > 0
    
    

Visualizing CPU Usage with Grafana

To visualize CPU usage with Grafana:

  1. Integrate Grafana with Prometheus:
    • Add Prometheus as a data source in Grafana
    • Configure the Prometheus server URL
  2. Create CPU usage dashboards:
    • Use the queries mentioned above
    • Add time series graphs, gauges, and tables
  3. Set up alerts for CPU usage thresholds:
    • Define alert rules based on CPU utilization percentages
    • Configure notification channels (email, Slack, etc.)

Visualizing CPU Usage with SigNoz

To visualize CPU usage with SigNoz:

  1. SetUp SigNoz

    SigNoz cloud is the easiest way to run SigNoz. Sign up for a free account and get 30 days of unlimited access to all features. Try SigNoz Cloud
CTA You can also install and self-host SigNoz yourself since it is open-source. With 18,000+ GitHub stars, open-source SigNoz is loved by developers. Find the instructions to self-host SigNoz.

  2. Integrate SigNoz with your Kubernetes cluster:

    • Deploy SigNoz using Helm or Kubernetes manifests
    • Configure SigNoz to scrape Prometheus metrics from your cluster
  3. Create CPU usage dashboards:

    • Use the PromQL queries mentioned earlier
    • Add various visualization types such as time series graphs, gauges, and tables
    • Leverage SigNoz's built-in Kubernetes dashboards for quick insights
  4. Set up alerts for CPU usage thresholds:

    • Define alert rules based on CPU utilization percentages
    • Configure notification channels (email, Slack, PagerDuty, etc.)

SigNoz offers several advantages for Kubernetes CPU monitoring:

  • Native support for Kubernetes environments
  • Unified platform for metrics, traces, and logs
  • Custom dashboard creation.
  • Advanced querying capabilities using PromQL
  • Built-in alerting features.

Optimizing Kubernetes Deployments Based on CPU Metrics

Analyze CPU usage patterns to right-size container resources:

  • If actual usage consistently exceeds requests, increase the request value
  • If usage is significantly below limits, consider lowering them

Implement Horizontal Pod Autoscaling (HPA) based on CPU metrics:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 50

To identify and resolve CPU bottlenecks:

  • Look for consistently high CPU usage or frequent throttling
  • Consider optimizing application code or increasing resources

Balance CPU usage across nodes by:

  • Using node affinity rules to distribute workloads
  • Implementing cluster autoscaler to add nodes during high demand

Key Takeaways

  • Prometheus effectively monitors Kubernetes CPU usage
  • Understanding Kubernetes metrics is crucial for accurate CPU usage calculation
  • PromQL queries enable detailed analysis of CPU utilization patterns
  • Visualizing CPU metrics with Grafana aids in resource optimization
  • Regular monitoring and analysis of CPU usage optimize Kubernetes performance

FAQs

What is the difference between CPU requests and limits in Kubernetes?

CPU requests guarantee a minimum amount of CPU resources, while limits cap the maximum CPU a container can use. Requests help with scheduling, ensuring nodes have enough resources. Limits prevent containers from consuming excessive CPU.

How often should I collect CPU metrics in a Kubernetes environment?

Collect CPU metrics every 15-30 seconds for real-time monitoring. For long-term analysis, aggregate data over longer intervals (e.g., 5 minutes) to reduce storage requirements while maintaining useful insights.

Can Prometheus monitor CPU usage of individual containers within a pod?

Yes, Prometheus can monitor CPU usage of individual containers within a pod. Use the container label in your queries to differentiate between containers in the same pod.

How do I handle CPU usage spikes in my Kubernetes cluster?

To handle CPU usage spikes:

  1. Set up alerts for sudden increases in CPU usage
  2. Implement Horizontal Pod Autoscaling to automatically scale during high load
  3. Use Vertical Pod Autoscaling to adjust resource requests and limits
  4. Analyze the cause of spikes and optimize application code if necessary

Enhance Your Monitoring with SigNoz

While Prometheus offers powerful monitoring capabilities, managing retention and scaling can become challenging as your infrastructure grows. SigNoz provides a comprehensive monitoring solution that builds upon Prometheus' strengths while addressing its limitations.

SigNoz cloud is the easiest way to run SigNoz. Sign up for a free account and get 30 days of unlimited access to all features. Try SigNoz Cloud
CTA You can also install and self-host SigNoz yourself since it is open-source. With 18,000+ GitHub stars, open-source SigNoz is loved by developers. Find the instructions to self-host SigNoz.

With SigNoz, you can:

  • Scale your monitoring infrastructure effortlessly
  • Access advanced querying and visualization capabilities
  • Benefit from integrated tracing and logging alongside metrics.
  • Get high performance with the clickhouse database
  • Take advantage of SigNoz's exceptional exception monitoring capabilities

Resources

Was this page helpful?