Skip to main content

Kubernetes Cluster Monitoring with OpenTelemetry | Complete Tutorial

· 20 min read
Favour Daniel

Monitoring Kubernetes cluster metrics ensures your containerized infrastructure operates as it should. By tracking essential indicators like CPU utilization, memory consumption, and pod/node statuses, you gain insights to proactively address issues, optimize resources, and maintain overall health. In this tutorial, you will configure OpenTelemetry Collector to collect Kubernetes cluster metrics and send them to SigNoz for monitoring and visualization.

Cover Image

In this tutorial, we cover:

If you want to jump straight into implementation, start with this prerequisites section.

What is a Kubernetes cluster?

A Kubernetes cluster is a set of nodes (physical or virtual machines) that run containerized applications orchestrated by Kubernetes. Kubernetes, often abbreviated as K8s, is an open-source container orchestration platform designed to automate the deployment, scaling, and management of containerized applications. It provides a framework for efficiently running multiple containers across a cluster of machines.

What is OpenTelemetry

OpenTelemetry is a set of APIs, SDKs, libraries, and integrations aiming to standardize the generation, collection, and management of telemetry data(logs, metrics, and traces). It is backed by the Cloud Native Computing Foundation and is the leading open-source project in the observability domain.

The data you collect with OpenTelemetry is vendor-agnostic and can be exported in many formats. Telemetry data has become critical in observing the state of distributed systems. With microservices and polyglot architectures, there was a need to have a global standard. OpenTelemetry aims to fill that space and is doing a great job at it thus far.

What is OpenTelemetry Collector?

OpenTelemetry Collector is a stand-alone service provided by OpenTelemetry. It can be used as a telemetry-processing system with a lot of flexible configurations that gather and process observability data, such as traces, metrics, and logs, from different parts of a software system. It then sends this data to chosen destinations, allowing for centralized analysis and monitoring. The collector simplifies the task of collecting and exporting telemetry data in cloud-native environments.

How does OpenTelemetry Collector collect data?

Data collection in OpenTelemetry Collector is facilitated through receivers. Receivers are configured via YAML under the top-level receivers tag. To ensure a valid configuration, at least one receiver must be enabled.

Below is an example of an otlp receiver:

receivers:
otlp:
protocols:
grpc:
http:

The OTLP receiver accepts data through gRPC or HTTP in the OTLP format. There are advanced configurations that you can enable via the YAML file.

Here’s a sample configuration for an otlp receiver:

receivers:
otlp:
protocols:
http:
endpoint: "localhost:4318"
cors:
allowed_origins:
- http://test.com
# Origins can have wildcards with *, use * by itself to match any origin.
- https://*.example.com
allowed_headers:
- Example-Header
max_age: 7200

You can find more details on advanced configurations here.

Once a receiver is configured, it needs to be enabled to start the data flow. This involves setting up pipelines within a service. A pipeline acts as a streamlined pathway for data, outlining how it should be processed and where it should go. A pipeline comprises the following:

  1. Receivers: These are entry points for data into the OpenTelemetry Collector, responsible for collecting data from various sources and feeding it into the pipeline.
  2. Processors: After data is received, processors manipulate, filter, or enhance the data as needed before it proceeds further in the pipeline. They provide a way to customize the data according to specific requirements.
  3. Exporters: After processing, the data is ready for export. Exporters define the destination for the data, whether it's an external monitoring system, storage, or another service. They format the data appropriately for the chosen output.

Below is an example pipeline configuration:

service:
pipelines:
metrics:
receivers: [otlp, prometheus]
processors: [batch]
exporters: [otlp, prometheus]

Here’s a breakdown of the above metrics pipeline:

  • Receivers: This pipeline is configured to receive metrics data from two sources: OTLP and Prometheus. The otlp receiver collects metrics using both gRPC and HTTP protocols, while the prometheus receiver gathers metrics from Prometheus.
  • Processors: Metrics data is processed using the batch processor. This processor likely batches metrics before exporting them, optimizing the data flow.
  • Exporters: Metrics processed through this pipeline are exported to both OTLP and Prometheus destinations. The otlp exporter sends data to an endpoint specified in the configuration, and the prometheus exporter handles the export of metrics to a Prometheus-compatible destination.

Collecting Kubernetes Cluster Metrics with OpenTelemetry Collector

In this section, you will learn how Kubernetes cluster metrics can be collected with the OpenTelemetry Collector and how to visualize the collected metrics in Signoz.

Prerequisites

Setting up SigNoz

You need a backend to which you can send the collected data for monitoring and visualization. SigNoz is an OpenTelemetry-native APM that is well-suited for visualizing OpenTelemetry data.

SigNoz cloud is the easiest way to run SigNoz. You can sign up here for a free account and get 30 days of unlimited access to all features.

You can also install and self-host SigNoz yourself. Check out the docs for installing self-host SigNoz.

Creating manifest files

In a Kubernetes environment, manifest files are utilized for deploying various Kubernetes resources. Several manifest files will be created to deploy the OpenTelemetry Collector within a Kubernetes cluster:

  • configmap
  • service account
  • cluster role
  • cluster role binding
  • deployment

These files serve as a declarative configuration, defining the desired state of the resources such as deployments, services, and config maps, facilitating the efficient deployment and management of the OpenTelemetry Collector components in a Kubernetes environment.

Configmap

A ConfigMap is an API resource that provides a way to store configuration data in key-value pairs. ConfigMaps are often used to store non-sensitive configuration information, such as configuration files, environment variables, or any configuration data that your application needs.

Here, a configmap will be used for the OpenTelemetry Collector setup. In your terminal, create a configmap.yml file and paste the below content:

apiVersion: v1
kind: ConfigMap
metadata:
name: otelcontribcol
labels:
app: otelcontribcol
data:
config.yaml: |
receivers:
k8s_cluster:
collection_interval: 10s
exporters:
debug:
otlp:
endpoint: "ingest.{region}.signoz.cloud:443"
tls:
insecure: false
timeout: 20s # Adjust the timeout value as needed
headers:
"signoz-access-token": "<SIGNOZ_INGESTION_KEY>"
service:
pipelines:
metrics:
receivers: [k8s_cluster]
exporters: [debug, otlp]
logs/entity_events:
receivers: [k8s_cluster]
exporters: [debug, otlp]This above manifest sets up OpenTelemetry instrumentation for collecting Kubernetes-related metrics and logs. It uses a custom "otlp" exporter to send the collected data to a specified endpoint with additional authentication through a provided access token.

This configuration sets up OpenTelemetry instrumentation for collecting Kubernetes-related metrics and logs. It uses a custom "otlp" exporter to send the collected data to a specified endpoint with additional authentication through a provided access token.

Replace {region} with the region for your SigNoz cloud account and <SIGNOZ_INGESTION_KEY> with the ingestion key for your account. You can find these settings in the SigNoz dashboard under Settings > Ingestion Settings.

You can find ingestion details in the SigNoz dashboard
You can find ingestion details in the SigNoz dashboard

Create the configmap:

kubectl apply -f configmap.yml

Service Account

A ServiceAccount is a Kubernetes object that provides an identity for processes running in a Pod. It defines the permissions and access scope for the processes within the cluster.

In your terminal, create a serviceaccount.yml file and paste the below content:

apiVersion: v1;
kind: ServiceAccount;
metadata: labels: app: otelcontribcol;
name: otelcontribcol;

The above configuration defines a ServiceAccount named "otelcontribcol" which provides an identity for pods or processes running within the cluster and can be referenced by other Kubernetes objects, such as Deployments or Pods, to define the set of permissions and access scope for those objects.

Create the service account:

kubectl apply -f serviceaccount.yml

Cluster Role

A ClusterRole is a set of permissions that can be assigned to resources within a Kubernetes cluster. It defines what actions are allowed on which resources in the cluster.

In your terminal, create a clusterrole.yml file and paste the below content:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otelcontribcol
labels:
app: otelcontribcol
rules:
- apiGroups:
- ""
resources:
- events
- namespaces
- namespaces/status
- nodes
- nodes/spec
- pods
- pods/status
- replicationcontrollers
- replicationcontrollers/status
- resourcequotas
- services
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- daemonsets
- deployments
- replicasets
- statefulsets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs
- cronjobs
verbs:
- get
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- get
- list
- watch

Create the cluster role:

kubectl apply -f clusterrole.yml

Cluster Role Binding

A ClusterRoleBinding binds a ClusterRole to a user, group, or service account within the entire cluster. It grants the defined set of permissions to the specified identity.

In your terminal, create a clusterrolebinding.yml file and paste the below content:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otelcontribcol
labels:
app: otelcontribcol
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: otelcontribcol
subjects:
- kind: ServiceAccount
name: otelcontribcol
namespace: default

In the provided configuration, the ClusterRoleBinding establishes a connection between the previously defined ClusterRole and the ServiceAccount. This binding ensures that the permissions and access rights specified in the ClusterRole are associated with the ServiceAccount, allowing it to perform actions within the cluster as defined by the role.

Create the cluster role binding:

kubectl apply -f clusterrolebinding.yml

Deployment

A Deployment is a Kubernetes resource that defines how to create, update, and scale a set of Pods. It ensures the desired number of replicas are running and manages rolling updates.

In your terminal, create a deployment.yml file and paste the below content:

apiVersion: apps/v1
kind: Deployment
metadata:
name: otelcontribcol
labels:
app: otelcontribcol
spec:
replicas: 1
selector:
matchLabels:
app: otelcontribcol
template:
metadata:
labels:
app: otelcontribcol
spec:
serviceAccountName: otelcontribcol
containers:
- name: otelcontribcol
image: otel/opentelemetry-collector-contrib
args: ["--config", "/etc/config/config.yaml"]
volumeMounts:
- name: config
mountPath: /etc/config
imagePullPolicy: IfNotPresent
volumes:
- name: config
configMap:
name: otelcontribcol

The provided Deployment configuration creates an instance of the "otelcontribcol" application within the cluster. The deployment specifies one replica, associates it with the ServiceAccount created earlier, and mounts the ConfigMap created earlier as a volume that will be used to set up the OpenTelemtry Collector within the Deployment.

Create the deployment:

kubectl apply -f deployment.yml

The deployment will create a replica set which in turn creates a pod based on the desired number of replicas. Fetch the logs of the pod to confirm that the OpenTelemetry Collector is working fine. The log output should be as below:

Users-MacBook-Pro-2:k8s demo$ kubectl logs otelcontribcol-764d65d6d5-mkv97
2023-11-27T23:32:56.927Z info service@v0.89.0/telemetry.go:85 Setting up own telemetry...
2023-11-27T23:32:56.929Z info service@v0.89.0/telemetry.go:202 Serving Prometheus metrics {"address": ":8888", "level": "Basic"}
2023-11-27T23:32:56.929Z info exporter@v0.89.0/exporter.go:275 Development component. May change in the future. {"kind": "exporter", "data_type": "metrics", "name": "debug"}
2023-11-27T23:32:56.930Z info exporter@v0.89.0/exporter.go:275 Development component. May change in the future. {"kind": "exporter", "data_type": "logs", "name": "debug"}
2023-11-27T23:32:56.935Z info service@v0.89.0/service.go:143 Starting otelcol-contrib... {"Version": "0.89.0", "NumCPU": 12}
2023-11-27T23:32:56.935Z info extensions/extensions.go:34 Starting extensions...
2023-11-27T23:32:56.972Z info k8sclusterreceiver@v0.89.0/receiver.go:53 Starting shared informers and wait for initial cache sync. {"kind": "receiver", "name": "k8s_cluster", "data_type": "logs"}
2023-11-27T23:32:56.973Z info service@v0.89.0/service.go:169 Everything is ready. Begin running and processing data.
2023-11-27T23:32:57.075Z info k8sclusterreceiver@v0.89.0/receiver.go:74 Completed syncing shared informer caches. {"kind": "receiver", "name": "k8s_cluster", "data_type": "logs"}

You can find more information on OpenTelemetry Kubernetes receiver here.

Monitoring Kubernetes cluster metrics with SigNoz dashboard

Once the collector service has been started successfully, navigate to your SigNoz Cloud account and access the "Dashboard" tab. Click on the “New Dashboard” button to create a new dashboard.

SigNoz dashboard
SigNoz dashboard

To give the dashboard a name, click on “Configure.”

Configuring dashboard
Configuring dashboard

Enter your preferred dashboard name in the "Name" input box and save the changes.

Dashboard Naming
Dashboard Naming

Now, you can create various panels for your dashboard. There are three visualization options to display your data: Time Series, Value, and Table formats. Choose the format that best suits your preferences, depending on the metric you want to monitor. You can opt for the "Time Series" visualization for the initial metric.

Dashboard visualization options
Dashboard visualization options

In the "Query Builder" tab, enter "k8s," and you should see various Kubernetes metrics. This confirms that the OpenTelemetry Collector is successfully collecting the Kubernetes cluster metrics and forwarding them to SigNoz for monitoring and visualization.

Collected Kubernetes cluster metrics for visualization
Collected Kubernetes cluster metrics for visualization

You can query the collected metrics using the query builder and create panels for your dashboard.

Monitoring dashboard for the Kubernetes cluster
Monitoring dashboard for the Kubernetes cluster

Visit the SigNoz documentation to learn more about creating dashboards and running queries.

Besides just setting up dashboards to monitor your Kubernetes cluster metrics, you can create alerts for the different metrics you query. Click on the drop-down of the panel from your dashboard, and then click on “Create Alerts”.

Create alerts on important Kubernetes cluster
Create alerts on important Kubernetes cluster

It will take you to the alerts page; from there, you can create the alerts.

Reference: Metrics and Attributes for Kubernetes Cluster supported by OpenTelemetry

The following metrics and resource attributes for the Kubernetes cluster can be collected by the OpenTelemetry Collector.

Metrics

These metrics are enabled by default. Collectors provide many metrics that you can use to monitor how your Kubernetes cluster and its resources are performing or if something is not right.

Key Terms for Metrics & Attributes

  • Metric Name: The name of the metric is a unique identifier that distinguishes it from other metrics. It helps in referencing and organizing various metrics on SigNoz as well.
  • Metric Type: The type of metric defines the kind of data it represents. The metric type indicates the type of data that the metric measures. some common metric types include gauge, counter, sum, and histogram.
  • Value Type: The value type indicates the type of data that is used to represent the value of the metric. Some common value types are integer and double.
  • Unit: The unit specifies the measurement unit associated with the metric. It helps in interpreting and comparing metric values, including Bytes, NONE, etc.
MetricsDescriptionMetrics NameMetric TypeValue TypeUnit
Container CPU LimitMaximum CPU limit assigned to a containerk8s.container.cpu_limitGaugeDouble{cpu}
Container CPU RequestCPU resources requested by a containerk8s.container.cpu_requestGaugeDouble{cpu}
Container Ephemeral Storage LimitMaximum ephemeral storage limit for a containerk8s.container.ephemeralstorage_limitGaugeIntBy
Container Ephemeral Storage RequestEphemeral storage requested by a containerk8s.container.ephemeralstorage_requestGaugeIntBy
Container Memory LimitMaximum memory limit assigned to a containerk8s.container.memory_limitGaugeIntBy
Container Memory RequestMemory resources requested by a containerk8s.container.memory_requestGaugeIntBy
Container ReadyIndicates if a container is readyk8s.container.readyGaugeInt
Container RestartsNumber of restarts for a containerk8s.container.restartsGaugeInt{restart}
Container Storage LimitMaximum storage limit for a containerk8s.container.storage_limitGaugeIntBy
Container Storage RequestStorage resources requested by a containerk8s.container.storage_requestGaugeIntBy
CronJob Active JobsNumber of active jobs for a CronJobk8s.cronjob.active_jobsGaugeInt{job}
DaemonSet Current Scheduled NodesNumber of nodes currently scheduled by a DaemonSetk8s.daemonset.current_scheduled_nodesGaugeInt{node}
DaemonSet Desired Scheduled NodesDesired number of nodes to be scheduled by a DaemonSetk8s.daemonset.desired_scheduled_nodesGaugeInt{node}
DaemonSet Misscheduled NodesNumber of nodes misscheduled by a DaemonSetk8s.daemonset.misscheduled_nodesGaugeInt{node}
DaemonSet Ready NodesNumber of nodes ready in a DaemonSetk8s.daemonset.ready_nodesGaugeInt{node}
Deployment AvailableNumber of available pods in a Deploymentk8s.deployment.availableGaugeInt{pod}
Deployment DesiredDesired number of pods in a Deploymentk8s.deployment.desiredGaugeInt{pod}
Horizontal Pod Autoscaler (HPA) Current ReplicaCurrent number of replicas in an HPAk8s.hpa.current_replicasGaugeInt{pod}
HPA Desired ReplicasDesired number of replicas in an HPAk8s.hpa.desired_replicasGaugeInt{pod}
HPA Max ReplicasMaximum number of replicas in an HPAk8s.hpa.max_replicasGaugeInt{pod}
HPA Min ReplicasMinimum number of replicas in an HPAk8s.hpa.min_replicasGaugeInt{pod}
Job Active PodsNumber of active pods for a Jobk8s.job.active_podsGaugeInt{pod}
Job Desired Successful PodsDesired number of successfully completed pods for a Jobk8s.job.desired_successful_podsGaugeInt{pod}
Job Failed PodsNumber of failed pods for a Jobk8s.job.failed_podsGaugeInt{pod}
Job Max Parallel PodsMaximum parallel pods for a Jobk8s.job.max_parallel_podsGaugeInt{pod}
Job Successful PodsNumber of successfully completed pods for a Jobk8s.job.successful_podsGaugeInt{pod}
Namespace PhasePhase of the Kubernetes namespacek8s.namespace.phaseGaugeInt
Pod PhasePhase of a Kubernetes podk8s.pod.phaseGaugeInt
ReplicaSet AvailableNumber of available pods in a ReplicaSetk8s.replicaset.availableGaugeInt{pod}
ReplicaSet DesiredDesired number of pods in a ReplicaSetk8s.replicaset.desiredGaugeInt{pod}
Replication Controller AvailableNumber of available pods in a Replication Controllerk8s.replication_controller.availableGaugeInt{pod}
Replication Controller DesiredDesired number of pods in a Replication Controllerk8s.replication_controller.desiredGaugeInt{pod}
Resource Quota Hard LimitHard resource limit defined in a Resource Quotak8s.resource_quota.hard_limitGaugeInt{resource}
Resource Quota UsedUsed resource in a Resource Quotak8s.resource_quota.usedGaugeInt{resource}
StatefulSet Current PodsNumber of current pods in a StatefulSetk8s.statefulset.current_podsGaugeInt{pod}
StatefulSet Desired PodsDesired number of pods in a StatefulSetk8s.statefulset.desired_podsGaugeInt{pod}
StatefulSet Ready PodsNumber of ready pods in a StatefulSetk8s.statefulset.ready_podsGaugeInt{pod}
StatefulSet Updated PodNumber of updated pods in a StatefulSetk8s.statefulset.updated_podsGaugeInt{pod}

You can visit the Kubernetes cluster receiver GitHub repo to learn more about these metrics.

Resource Attributes

Resource attributes are a set of key-value pairs that provide additional context about the source of a metric. They are used to identify and classify metrics, and to associate them with specific resources or entities within a system.

The below attributes are enabled by default for a Kubernetes cluster.

NameDescriptionValuesEnabled
container.idThe container id.Any Strtrue
container.image.nameThe container image nameAny Strtrue
container.image.tagThe container image tagAny Strtrue
k8s.container.nameThe k8s container nameAny Strtrue
k8s.cronjob.nameThe k8s CronJob nameAny Strtrue
k8s.cronjob.uidThe k8s CronJob uid.Any Strtrue
k8s.daemonset.nameThe k8s daemonset name.Any Strtrue
k8s.daemonset.uidThe k8s daemonset uid.Any Strtrue
k8s.deployment.nameThe name of the Deployment.Any Strtrue
k8s.deployment.uidThe UID of the Deployment.Any Strtrue
k8s.hpa.nameThe k8s hpa name.Any Strtrue
k8s.hpa.uidThe k8s hpa uid.Any Strtrue
k8s.job.nameThe k8s pod name.Any Strtrue
k8s.job.uidThe k8s job uid.Any Strtrue
k8s.kubelet.versionThe version of Kubelet running on the node.Any Strfalse
k8s.kubeproxy.versionThe version of Kube Proxy running on the node.Any Strfalse
k8s.namespace.nameThe k8s namespace name.Any Strtrue
k8s.namespace.uidThe k8s namespace uid.Any Strtrue
k8s.node.nameThe k8s node name.Any Strtrue
k8s.node.uidThe k8s node uid.Any Strtrue
k8s.pod.nameThe k8s pod name.Any Strtrue
k8s.pod.qos_classThe k8s pod qos class name. One of Guaranteed, Burstable, BestEffort.Any Strfalse
k8s.pod.uidThe k8s pod uid.Any Strtrue
k8s.replicaset.nameThe k8s replicaset nameAny Strtrue
k8s.replicaset.uidThe k8s replicaset uidAny Strtrue
k8s.replicationcontroller.nameThe k8s replicationcontroller name.Any Strtrue
k8s.replicationcontroller.uidThe k8s replicationcontroller uid.Any Strtrue
k8s.resourcequota.nameThe k8s resourcequota name.Any Strtrue
k8s.resourcequota.uidThe k8s resourcequota uid.Any Strtrue
k8s.statefulset.nameThe k8s statefulset name.Any Strtrue
k8s.statefulset.uidThe k8s statefulset uid.Any Strtrue

You can see these resource attributes in the OpenTelemetry Collector Contrib repo for the Kubernetes cluster receiver.

Conclusion

In this tutorial, you configured an OpenTelemetry collector to export metrics data from a Kubernetes cluster. You then sent the data to SigNoz for monitoring and visualization.

Visit our complete guide on OpenTelemetry Collector to learn more about it.

OpenTelemetry is becoming a global standard for open-source observability, offering advantages such as a unified standard for all telemetry signals and avoiding vendor lock-in. With OpenTelemetry, instrumenting your applications to collect logs, metrics, and traces becomes seamless, and you can monitor and visualize your telemetry data with SigNoz.

SigNoz is an open-source OpenTelemetry-native APM that can be used as a single backend for all your observability needs.


Further Reading