July 17, 202514 min read

Kubernetes Observability with OpenTelemetry | A Complete Setup Guide

Author:

Elizabeth MathewElizabeth Mathew

Kubernetes provides a wealth of telemetry data from container metrics and application traces to cluster events and logs. OpenTelemetry [OTel] offers a vendor-neutral, end-to-end solution for collecting and exporting this telemetry in a standardised format.

In this blog, we’ll walk through a Kubernetes observability setup on Minikube using OpenTelemetry, but you can easily extend the setup to your Kubernetes setup as well. We’ll deploy an example instrumented application, then set up the OpenTelemetry Collector in two modes - DaemonSet and Deployment, using the official Helm charts. This two-pronged approach will capture both service-level metrics and traces, as well as cluster-level metrics and events, giving you full visibility into your Kubernetes environment.

This walkthrough aims to give you a solid practical foundation and a clear blueprint to instrument your own Kubernetes cluster using OpenTelemetry.

Let’s get started.

Minikube and the OpenTelemetry Demo App

Info

If you already have a Kubernetes setup, jump to the next section to see how to get telemetry data from them directly. If you want to experiment, play around or want to explore in a demo environment, read on.

For our demo, we’ll deploy the OpenTelemetry Astronomy Shop demo application in a minikube cluster, a microservices app instrumented with OpenTelemetry [it generates traces, metrics and logs]. The OTel project provides a convenient Helm chart for this demo.

⚠️ Warning

The full demo needs ~6 GB of memory, so ensure your Minikube has sufficient resources. If your minikube driver is Docker Desktop, you can increase your memory and cores from the settings.

Prerequisites

Ensure you have Minikube running [a single-node Kubernetes cluster] and have kubectl and helm installed.

Setup

Let’s start our minikube.

minikube start --memory 8192 --cpus 6

The OpenTelemetry Helm repository should be added:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

You can install the OTel demo with,

# Create a namespace for the demo [optional]
kubectl create namespace otel-demo

# Install the OpenTelemetry Demo microservices app
helm install my-otel-demo open-telemetry/opentelemetry-demo -n otel-demo

This will spin up various services [frontend, backend, load generator, etc]. The demo app is pre-instrumented to emit telemetry. By default, it includes its own OTel Collector. However, we want to showcase our own collectors as well, monitoring the cluster. Therefore, we will treat the demo app as just another set of workloads; our collectors will scrape and receive telemetry from it, just like any other application.

Info

We already have a pod running OTel collector, which comes bundled with the demo. It collects telemetry related to the application. You can configure the values.yaml to send it to any backend of your choice.

At this point, our Minikube cluster has a running demo application. Now we’ll set up two OpenTelemetry Collectors:

  • One running on each node [DaemonSet] to act as an agent
  • And one running as a single-instance gateway [Deployment] for cluster-level data.

Using both ensures we capture all layers of telemetry.

OpenTelemetry Collector as a DaemonSet [Agent]

First, we deploy the collector as a DaemonSet. This means an OTel Collector pod will run on each node of the cluster [in Minikube’s case, just one node = one pod]. This “agent” collector will focus on node and pod-level metrics, container logs, and will serve as a local OTLP endpoint for application telemetry.

In OTel terms, the DaemonSet collector will include components like,

1/ an OTLP receiver [to get app traces/metrics from local pods],

2/ the kubelet stats receiver [for CPU/memory usage of pods and nodes],

3/ the filelog receiver [for logs from pods on that node],

4/ and a Kubernetes attributes processor [to tag all data with k8s metadata]

The official Helm chart makes it easy to enable these pieces via presets. Let's prepare a values file for the Helm chart [call it otel-collector-daemonset.yaml ].

mode: daemonset

image:
  repository: otel/opentelemetry-collector-k8s

replicaCount: 1

mode: daemonset

image:
  repository: otel/opentelemetry-collector-k8s

presets:
  kubernetesAttributes:
    enabled: true
  kubeletMetrics:
    enabled: true
  logsCollection:
    enabled: true

config:
  exporters:
    otlp:
      endpoint: <OBSERVABILITY_BACKEND>
      

  connectors:
    spanmetrics:

  service:
    pipelines:
      traces:
        exporters: [spanmetrics, otlp]
      metrics:
        receivers: [otlp, spanmetrics]
        exporters: [otlp] 
      logs:
        exporters: [otlp]

📝 Note

What Are Presets in a Helm Chart?

Helm charts may includes presets, which are convenient, pre-configured building blocks for common use cases. When enabled in your values.yaml, these presets automatically wire up the necessary receivers, processors, and exporters in the collector’s pipeline.

Now, we install the DaemonSet collector using Helm:

helm install otel-collector open-telemetry/opentelemetry-collector \
    -n otel-demo --values otel-collector-daemonset.yaml  

This installs the OTel Collector with the name otel-collector in the specified namespace. Verify that a DaemonSet pod is running:

kubectl get pods -n otel-demo   

You should see one pod [named something like otel-collector-*****] for the single Minikube node. If you had multiple nodes, you’d see one per node.

Great.

We now have the OTel agent running, which automatically scrapes host metrics and listens for app telemetry on each node.

Functionality of the Collector Daemonset

We set mode: daemonset to get one collector per node. The Helm chart will handle setting the proper Kubernetes DaemonSet object, including RBAC permissions, host ports, and volume mounts needed to access the host’s logs and kubelet API.

  • The kubernetesAttributes processor is enabled to auto-annotate metrics, traces, and logs with Kubernetes context or meta [e.g. add k8s.pod.name, k8s.node.name, etc.] for correlation. This is crucial for connecting application telemetry with the platform/ infra telemetry.

  • The kubeletMetrics preset enables the kubeletstats receiver. This receiver pulls resource usage metrics from each node’s Kubelet API. It provides metrics like container memory usage, pod CPU usage, and node network errors, among others.

    For example, it will report how much CPU and memory each pod/container is using, which is vital for performance monitoring and right-sizing.

  • The logsCollection preset enables a file log tailer. This will read logs from /var/log/pods/... on each node [where the container stdout/stderr is stored by Kubernetes]. This means you can collect application logs and node logs without running a separate log agent. The logs are ingested into the collector’s log pipeline, also annotated with pod metadata [thanks to the attributes processor, mentioned before].

By default, the Helm chart also leaves an OTLP receiver open [on the collector’s own pod IP] for traces and metrics.

Kubernetes Logging

The logsCollection preset in our DaemonSet collector is designed to collect logs from /var/log/pods/... on each node, capturing the container stdout/ stderr. This is a common and effective method for integrating application logs into your observability pipeline. However, it's essential to understand the broader context of Kubernetes logging.

Kubernetes Logging Best Practices

Effective logging in Kubernetes involves more than just collecting stdout/stderr. Here are some best practices:

  1. Log to Standard Output/Error: This is the most fundamental practice. Containerised applications should write all logs to stdout and stderr. Kubernetes automatically captures these streams and hands them off to the container runtime [e.g., containerd, Docker], which then writes them to a file on the node [typically in /var/log/pods/<pod-uid>/<container-name>/<log-id>.log ]. This is exactly what the filelog receiver in OpenTelemetry collects.

  2. Structured Logging: Whenever possible, applications should emit logs in a structured format, such as JSON. This makes logs much easier to parse, query, and analyze programmatically. Instead of plain text, a JSON log might look like,

    {"timestamp": "...", "level": "INFO", "service": "frontend", "message": "User logged in", "user_id": "123", "ip_address": "192.168.1.10"}
    
  3. Appropriate Log Levels: Use standard log levels [DEBUG, INFO, WARN, ERROR, FATAL] consistently. This allows for filtering and severity-based alerting.

  4. Enrich Logs with Context: While OpenTelemetry's kubernetesAttributes processor automatically adds Kubernetes metadata (pod name, namespace, container ID], applications themselves can add valuable business-level context [e.g., user_id, transaction_id]. This bridges the gap between infrastructure logs and application-specific insights.

  5. Centralised Logging System: Ship logs to a centralised system [like an OpenTelemetry backend, Elasticsearch, Loki, etc.] for aggregation, storage, search, and analysis. Relying solely on kubectl logs is not scalable for production environments.

  6. Log Rotation and Retention: Implement proper log rotation to prevent nodes from running out of disk space. Define clear retention policies for your centralised logging solution based on compliance and operational needs.

  7. Resource Limits for Logging Agents: Ensure your logging agents [like the OTel Collector DaemonSet] have appropriate CPU and memory limits to prevent them from consuming excessive resources or being evicted.

  8. Avoid Logging Sensitive Information: Never log sensitive data such as passwords, API keys, or personally identifiable information [PII].

Limitations of kubectl logs

While kubectl logs is a convenient command for inspecting container logs, it has significant limitations for production-grade observability:

  1. Ephemeral Nature: kubectl logs retrieves logs from the container runtime on the node. If a pod restarts, moves to another node, or is deleted, its past logs may be lost unless they were streamed to a persistent storage.
  2. No Centralised View: You can only view logs for one pod or container at a time. It doesn't provide a consolidated view across multiple pods, deployments, or the entire cluster. Correlating events across different services becomes extremely challenging.
  3. Limited Search and Filtering: kubectl logs offers basic filtering [e.g., -since, -tail -f for follow]. However, it lacks advanced search capabilities, regex matching, or structured query language support that centralised logging systems provide.
  4. Performance Issues on High Volume: For applications generating a high volume of logs, kubectl logs -f can be overwhelming and may even put a strain on the Kubernetes API server.
  5. No Aggregation or Analytics: kubectl logs is purely for viewing raw log lines. It cannot perform aggregations, generate dashboards, detect anomalies, or provide insights into log patterns.
  6. Requires Direct Cluster Access: To use kubectl logs, you need direct kubectl access to the cluster, which might not be granted to all team members [e.g., developers needing to troubleshoot their application logs]. Centralised systems allow role-based access to log data.
  7. Scalability Challenges: As your cluster grows in size and complexity, manually sifting through logs using kubectl across hundreds or thousands of pods becomes impossible.

These limitations underscore the necessity of a dedicated logging solution, such as the one built with the OpenTelemetry Collector, to manage and gain insights from your Kubernetes logs effectively.

OpenTelemetry Collector as a Deployment [Cluster-Level Collector]

Now we set up a second collector instance, this time as a single Deployment [one pod] that will act as a central gateway for cluster-level data. This collector will be responsible for metrics and data that represent the entire cluster’s state, rather than a single node.

What does the cluster-level collector gather?

In our setup, we will enable the Kubernetes Cluster Receiver, which fetches high-level metrics about the cluster, and the Kubernetes Objects [Events] Receiver, which listens for Kubernetes events. These components give us insight into the overall health and activity of the cluster.

To configure the Deployment-mode collector via Helm, we create otel-collector-deployment.yaml:

mode: deployment

image:
  repository: otel/opentelemetry-collector-k8s

replicaCount: 1

presets:
  clusterMetrics:
    enabled: true
  kubernetesEvents:
    enabled: true

config:
  exporters:
    otlp:
      endpoint: <OBSERVABILITY_BACKEND>

  connectors:
    spanmetrics:

  service:
    pipelines:
      traces:
        exporters: [spanmetrics, otlp]
      metrics:
        receivers: [otlp, spanmetrics]
        exporters: [otlp] 
      logs:
        exporters: [otlp]

Next, install the cluster-level collector with Helm:

helm install otel-collector-cluster open-telemetry/opentelemetry-collector \
   -n otel-demo --values otel-collector-deployment.yaml

This will create a Deployment [named otel-collector-cluster]. After a minute, verify the pod is running:

kubectl get pods -n otel-demo 

You should see one pod [e.g. otel-collector-cluster-xxxxxxxx ] actively scraping cluster state metrics and listening for events from the Kubernetes API server.

Functionality of the Collector Deployment

The mode: deployment tells the Helm chart to create a normal Deployment [default 1 replica]. We explicitly set replicaCount: 1 to reinforce that we only want one pod [to avoid duplicate collection]

Cluster Metrics & Attributes

Enabling clusterMetrics adds the k8s Cluster Receiver to the metrics pipeline, preconfigured to gather cluster stats. The Helm chart takes care of cluster-wide RBAC permissions [it will need to query the Kubernetes API for objects states] and any config details. Once running, this will feed metrics like k8s.node.count, k8s.pod.count_by_phase, k8s.deployment.desired vs ...available, and even things like total container restarts across the cluster.

Metric NameDescriptionAttributes
k8s.node.countTotal number of nodes in the clusterstatus
k8s.pod.count_by_phaseNumber of pods in each phasephase
k8s.deployment.desiredDesired number of replicas for a deploymentdeployment.name, namespace
k8s.deployment.availableNumber of available replicas for a deploymentdeployment.name, namespace
k8s.daemonset.desiredDesired number of pods for a DaemonSetdaemonset.name, namespace
k8s.daemonset.availableNumber of available pods for a DaemonSetdaemonset.name, namespace
k8s.statefulset.desiredDesired number of replicas for a StatefulSetstatefulset.name, namespace
k8s.statefulset.availableNumber of available replicas for a StatefulSetstatefulset.name, namespace
k8s.container.restarts_totalTotal number of container restartscontainer.name, pod.name, namespace
k8s.namespace.countTotal number of namespaces
k8s.cronjob.last_schedule_timeLast time a CronJob was successfully scheduledcronjob.name, namespace
k8s.job.active_podsNumber of active pods for a Jobjob.name, namespace
k8s.hpa.current_replicasCurrent number of replicas managed by an HPAhpa.name, namespace
k8s.hpa.desired_replicasDesired number of replicas managed by an HPAhpa.name, namespace

Kubernetes Events

Kubernetes events are objects that provide insights into what is happening inside the cluster. They are generated by the Kubernetes control plane components [like the API server, scheduler, and controller manager] and by nodes [kubelet]. Enabling kubernetesEvents adds the k8s Objects Receiver focusing on events [it actually watches all objects, but by default filters to events]. This means all Kubernetes events will appear in the collector’s logs pipeline. By default, the chart will configure this receiver to only capture events [not every object change] to keep data volume reasonable.

Some Kubernetes metrics as visualised by SigNoz
Some Kubernetes metrics as visualised by SigNoz
Infra metrics collected from the minikube cluster visualised in SigNoz
Infra metrics collected from the minikube cluster visualised in SigNoz

Why Is Kubernetes Events Monitoring Important?

Monitoring Kubernetes events is crucial for several reasons:

  1. Troubleshooting and Debugging: Events are often the first place to look when a pod isn't starting, a deployment isn't scaling, or a service is behaving unexpectedly. They provide immediate clues about underlying issues. For example, a FailedScheduling event tells you why a pod couldn't be placed on a node [e.g., insufficient resources, taints/tolerations].
  2. Operational Awareness: Events offer a real-time stream of cluster activity, allowing operators to understand the state changes within their environment. This helps in proactive identification of potential problems before they impact services.
  3. Security Auditing: Certain events, especially those related to API server activities or configuration changes, can be vital for security auditing and compliance.
  4. Performance Analysis: Events like Killing a container or Evicted pods can indicate resource contention or misconfigurations that affect application performance.
  5. Alerting: While not directly used for metrics, events can trigger alerts for critical issues, such as image pull failures, readiness probe failures, or persistent volume mounting errors.

By collecting and analysing Kubernetes events through the OTel Collector, you gain invaluable context for your metrics and traces, enabling faster root cause analysis and a more complete picture of your cluster's health.

Key Takeaways

  • OpenTelemetry provides end-to-end observability for Kubernetes, capturing traces, metrics, and logs in a standardised, vendor-neutral format.
  • Minikube provides a lightweight Kubernetes environment for safely experimenting with observability concepts before applying them to production clusters.
  • Deploying collectors in two modes ensures full coverage:
    • DaemonSet for per-node data [logs, kubelet stats, local app telemetry]
    • Deployment for cluster-wide data [Kubernetes metrics and events]
  • Helm presets like kubeletMetrics, logsCollection, and clusterMetrics make it easy to configure receivers and processors without writing full config blocks manually.
  • Though demoed on Minikube, this architecture mirrors real-world production setups and can be extended to any Kubernetes environment with minimal changes.

Was this page helpful?