Skip to main content

Monitoring Docker Containers Using OpenTelemetry [Full Tutorial]

· 18 min read
Abhishek Kothari

Monitoring Docker container metrics is essential for understanding the performance and health of your containers. OpenTelemetry collector can collect Docker container metrics and send it to a backend of your choice. In this tutorial, you will install an OpenTelemetry Collector to collect Docker container metrics and send it to SigNoz, an OpenTelemetry-native APM for monitoring and visualization.

Cover Image

In this tutorial, we cover:

If you want to jump straight into implementation, start with this Prerequisites section.

Dockerization has become quite popular in making application workloads portable. They help developers get rid of server-level dependencies and simplify testing and deployment of the applications themselves. With the adoption of Cloud native technologies, the adoption of Docker has naturally grown. This brought in the need for monitoring the Docker-based containers running on various types of computing.

Why monitor Docker container metrics?

Monitoring Docker container metrics can be crucial in various scenarios to avoid performance issues and assist developers in troubleshooting. container may start consuming an excessive amount of resources (CPU or memory), impacting other containers or the host system.

By monitoring CPU and memory usage, you can detect resource saturation early. This allows you to adjust resource allocation, optimize the application, or scale out your environment before users experience significant slowdowns or outages.

Some of the key factors why monitoring Docker containers is important are as follows:

  • Resource Optimization: It helps in allocating resources efficiently and scaling the containers as per the demand.
  • Performance Management: By understanding the resource utilization and demand, you can tune the performance of applications running inside the containers.
  • Troubleshooting: It enables quick identification and resolution of issues, reducing downtime and improving reliability.
  • Cost Management: In cloud environments, efficient resource usage can lead to significant cost savings.

We can use OpenTelemetry and a backend that supports OpenTelemetry-based data to monitor Docker containers efficiently. OpenTelemetry is quietly becoming the open-source standard for generating and collecting telemetry data.

A Brief Overview of OpenTelemetry

OpenTelemetry is a set of APIs, SDKs, libraries, and integrations aiming to standardize the generation, collection, and management of telemetry data(logs, metrics, and traces). It is backed by the Cloud Native Computing Foundation and is the leading open-source project in the observability domain.

The data you collect with OpenTelemetry is vendor-agnostic and can be exported in many formats. OpenTelemetry provides a tool called OpenTelemetry collector that we will use to collect Docker container metrics.

What is OpenTelemetry Collector?

OpenTelemetry Collector is a stand-alone service provided by OpenTelemetry. It can be used as a telemetry-processing system with a lot of flexible configurations to collect and manage telemetry data.

It can understand different data formats and send it to different backends, making it a versatile tool for building observability solutions.

Read our complete guide on OpenTelemetry Collector

How does OpenTelemetry Collector collect data?

A receiver is how data gets into the OpenTelemetry Collector. Receivers are configured via YAML under the top-level receivers tag. There must be at least one enabled receiver for a configuration to be considered valid.

Here’s an example of an otlp receiver:

receivers:
otlp:
protocols:
grpc:
http:

An OTLP receiver can receive data via gRPC or HTTP using the OTLP format. There are advanced configurations that you can enable via the YAML file.

Here’s a sample configuration for an otlp receiver.

receivers:
otlp:
protocols:
http:
endpoint: "localhost:4318"
cors:
allowed_origins:
- http://test.com
# Origins can have wildcards with *, use * by itself to match any origin.
- https://*.example.com
allowed_headers:
- Example-Header
max_age: 7200

You can find more details on advanced configurations here.

After configuring a receiver, you must enable it. Receivers are enabled via pipelines within the service section. A pipeline consists of a set of receivers, processors, and exporters.

The following is an example pipeline configuration:

service:
pipelines:
metrics:
receivers: [otlp, prometheus]
exporters: [otlp, prometheus]
traces:
receivers: [otlp, jaeger]
processors: [batch]
exporters: [otlp, zipkin]

Pre-requisites

In order to gather metrics from Docker containers, we would first require a Docker client to be installed. Once done, we can run a few simple containers and try to gather metrics related to them. This section guides you through quick database setup using Docker and Docker-Compose. You can skip the setup if you already have Docker running on your system and have a few containers started.

The below links can help you with the Docker installation:

Once the Docker container is installed, start a few containers using the below commands:

docker run nginx:latest -p 8080:80 -d
docker run httpd:latest -p 8081:80 -d
docker run mysql:latest -e MYSQL_ROOT_PASSWORD=mysecretpassword -p 3306:3306 -d

The above commands will start 3 containers on your system to allow us to gather some metrics when we start the OpenTelemetry collector. Next, let us start with the setup of OpenTelemetry Collector. It is assumed that you are setting up the OpenTelemetry collector on the same machine where you are running the Docker containers.

Setting up OpenTelemetry Collector

The OpenTelemetry Collector offers various deployment options to suit different environments and preferences. It can be deployed using Docker, Kubernetes, Nomad, or directly on Linux systems. You can find all the installation options here.

We are going to discuss the manual installation here and resolve any hiccups that come in the way.

Step 1 - Downloading OpenTelemetry Collector

Download the appropriate binary package for your Linux or macOS distribution from the OpenTelemetry Collector releases page. We are using the latest version available at the time of writing this tutorial.

For MACOS (arm64):

curl --proto '=https' --tlsv1.2 -fOL https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.88.0/otelcol-contrib_0.88.0_darwin_arm64.tar.gz

Step 2 - Extracting the package

Create a new directory named otelcol-contrib and then extract the contents of the otelcol-contrib_0.88.0_darwin_arm64.tar.gz archive into this newly created directory with the following command:

mkdir otelcol-contrib && tar xvzf otelcol-contrib_0.88.0_darwin_arm64.tar.gz -C otelcol-contrib

Step 3 - Setting up the configuration file

Create a config.yaml file in the otelcol-contrib folder. This configuration file will enable the collector to connect with the Docker socket and have other settings like at what frequency you want to monitor the containers. The docker stats receiver communicates directly with the docker socket that provides the metrics and other relevant details for monitoring.

Note: The configuration file should be created in the same directory where you unpack the otel-collector-contrib binary. In case you have globally installed the binary, it is ok to create on any path.

receivers:
otlp:
protocols:
grpc:
http:
docker_stats:
endpoint: unix:///var/run/docker.sock
collection_interval: 30s
timeout: 10s
api_version: 1.24
metrics:
container.uptime:
enabled: true
container.restarts:
enabled: true
container.network.io.usage.rx_errors:
enabled: true
container.network.io.usage.tx_errors:
enabled: true
container.network.io.usage.rx_packets:
enabled: true
container.network.io.usage.tx_packets:
enabled: true
processors:
batch:
send_batch_size: 1000
timeout: 10s
resourcedetection:
detectors: [env, system]
timeout: 2s
system:
hostname_sources: [os]
exporters:
otlp:
endpoint: "ingest.{region}.signoz.cloud:443"
tls:
insecure: false
headers:
signoz-access-token: "{signoz-token}"
logging:
verbosity: normal

service:
pipelines:
metrics:
receivers: [otlp, docker_stats]
processors: [resourcedetection, batch]
exporters: [otlp]

You would need to replace region and signoz-token in the above file with the region of your choice (for Signoz Cloud) and token obtained from Signoz Cloud → Settings → Ingestion Settings.

Find ingestion settings in SigNoz dashboard
Find ingestion settings in SigNoz dashboard

The above configuration additionally contains a resource detection process that helps identify the host attributes better. The docker socket path remains the same for UNIX-based systems; however, for any other systems, you can refer to this documentation to know more.

Step 4 - Running the collector service

Every Collector release includes an otelcol executable that you can run. Since we’re done with configurations, we can now run the collector service with the following command.

From the otelcol-contrib, run the following command:

./otelcol-contrib --config ./config.yaml

If you want to run it in the background -

./otelcol-contrib --config ./config.yaml &> otelcol-output.log & echo "$\!" > otel-pid

Step 5 - Debugging the output

If you want to see the output of the logs, we’ve just set it up for the background process. You may look it up with:

tail -f -n 50 otelcol-output.log

tail 50 will give the last 50 lines from the file otelcol-output.log

You can stop the collector service with the following command:

kill "$(< otel-pid)"

You should start seeing the metrics on your Signoz Cloud UI in about 30 seconds. You can import this (link to be added) dashboard JSON into your Signoz environment quite easily to monitor your MongoDB database.

Monitoring with Signoz Dashboard

Once the above setup is done, you will be able to access the metrics in the SigNoz dashboard. You can go to the Dashboards tab and try adding a new panel. You can learn how to create dashboards in SigNoz here.

Docker container metrics collected by OpenTelemetry collector
Docker container metrics collected by OpenTelemetry collector

You can easily create charts with query builder in SigNoz. Here are the steps to add a new panel to the dashboard.

Creating a dashboard panel for average memory usage per container
Creating a dashboard panel for average memory usage per container

You can build a complete dashboard around various metrics emitted. Here’s a look at a sample dashboard we built out using the metrics collected. You can get started quickly with this dashboard by using the JSON here.

Dashboard for monitoring Docker Container Metrics in SigNoz
Dashboard for monitoring Docker Container Metrics in SigNoz

You can also create alerts on any metric. Learn how to create alerts here.

Create alerts on any Docker container metrics
Create alerts on any metrics and get notified in a notification channel of your choice

Reference: Docker container metrics and labels collected by OpenTelemetry Collector

NameDescriptionAvailability (cgroup v1/v2)Type
container.blockio.io_service_bytes_recursiveNumber of bytes transferred to/from the disk by the group and descendant groups.BothSum
container.cpu.usage.kernelmodeTime spent by tasks of the cgroup in kernel mode (Linux). Time spent by all container processes in kernel mode (Windows).BothSum
container.cpu.usage.totalTotal consumed CPU timeBothSum
container.cpu.usage.usermodeTime spent by tasks of the cgroup in user mode (Linux). Time spent by all container processes in user mode (Windows).BothSum
container.cpu.utilizationPercentage usage of CPUBothGauge
container.memory.fileTotal memory usedcgroup v2Sum
container.memory.percentPercentage of memory used.cgroup v1Gauge
container.memory.total_cacheTotal cache memory used by the processes of the cgroupBothSum
container.memory.usage.limitMemory limits for the container (if specified)BothSum
container.memory.usage.totalMemory usage of the containers excluding cacheBothSum
container.network.io.usage.rx_bytesBytes received by the containerBothSum
container.network.io.usage.rx_droppedIncoming packets dropped by the containerBothSum
container.network.io.usage.tx_bytesBytes transmitted by the containerBothSum
container.network.io.usage.tx_droppedOutgoing packets that got droppedBothSum

Optional Metrics

The following metrics are not emitted by default. Each of them can be enabled by applying the following configuration:

metrics:
<metric_name>:
enabled: true
NameDescriptionAvailability (cgroup v1/v2)Type
container.blockio.io_merged_recursiveNumber of bios/requests merged into requests belonging to this cgroup and its descendant cgroupscgroup v1Sum
container.blockio.io_queued_recursiveNumber of requests queued up for this cgroup and its descendant cgroupscgroup v1Sum
container.blockio.io_service_time_recursiveTotal amount of time in nanoseconds between request dispatch and request completion for the IOs done by this cgroup and descendant cgroupscgroup v1Sum
container.blockio.io_serviced_recursiveNumber of IOs (bio) issued to the disk by the group and descendant groupscgroup v1Sum
container.blockio.io_time_recursiveDisk time allocated to cgroup (and descendant cgroups) per device in millisecondscgroup v1Sum
container.blockio.io_wait_time_recursiveTotal amount of time the IOs for this cgroup (and descendant cgroups) spent waiting in the scheduler queues for servicecgroup v1Sum
container.blockio.sectors_recursiveNumber of sectors transferred to/from disk by the group and descendant groupscgroup v1Sum
container.cpu.limitCPU limit set for the container.BothSum
container.cpu.sharesCPU shares set for the container.BothSum
container.cpu.throttling_data.periodsNumber of periods with throttling activeBothSum
container.cpu.throttling_data.throttled_periodsNumber of periods when the container hits its throttling limit.BothSum
container.cpu.throttling_data.throttled_timeAggregate time the container was throttledBothSum
container.cpu.usage.percpuPer-core CPU usage by the containercgroup v1Sum
container.cpu.usage.systemSystem CPU usage, as reported by dockerBothSum
container.memory.active_anonThe amount of anonymous memory that has been identified as active by the kernel.BothSum
container.memory.active_fileCache memory that has been identified as active by the kernel.BothSum
container.memory.anonAmount of memory used in anonymous mappings such as brk(), sbrk(), and mmap(MAP_ANONYMOUS) (Only available with cgroups v2).cgroup v2Sum
container.memory.cacheThe amount of memory used by the processes of this control group that can be associated precisely with a block on a block devicecgroup v1Sum
container.memory.dirtyBytes that are waiting to get written back to the disk, from this cgroupcgroup v1Sum
container.memory.hierarchical_memory_limitThe maximum amount of physical memory that can be used by the processes of this control groupcgroup v1Sum
container.memory.hierarchical_memsw_limitThe maximum amount of RAM + swap that can be used by the processes of this control groupcgroup v1Sum
container.memory.inactive_anonThe amount of anonymous memory that has been identified as inactive by the kernel.BothSum
container.memory.inactive_fileCache memory that has been identified as inactive by the kernel.BothSum
container.memory.mapped_fileIndicates the amount of memory mapped by the processes in the control groupcgroup v1Sum
container.memory.pgfaultIndicate the number of times that a process of the cgroup triggered a page fault.BothSum
container.memory.pgmajfaultIndicate the number of times that a process of the cgroup triggered a major fault.BothSum
container.memory.pgpginNumber of pages read from disk by the cgroupcgroup v1Sum
container.memory.pgpgoutNumber of pages written to disk by the cgroupcgroup v1Sum
container.memory.rssThe amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory mapscgroup v1Sum
container.memory.rss_hugeNumber of bytes of anonymous transparent hugepages in this cgroupcgroup v1Sum
container.memory.total_active_anonThe amount of anonymous memory that has been identified as active by the kernel. Includes descendant cgroupscgroup v1Sum
container.memory.total_active_fileCache memory that has been identified as active by the kernel. Includes descendant cgroupscgroup v1Sum
container.memory.total_dirtyBytes that are waiting to get written back to the disk, from this cgroup and descendantscgroup v1Sum
container.memory.total_inactive_anonThe amount of anonymous memory that has been identified as inactive by the kernel. Includes descendant cgroupscgroup v1Sum
container.memory.total_inactive_fileCache memory that has been identified as inactive by the kernel. Includes descendant cgroupscgroup v1Sum
container.memory.total_mapped_fileIndicates the amount of memory mapped by the processes in the control group and descendant groupscgroup v1Sum
container.memory.total_pgfaultIndicate the number of times that a process of the cgroup (or descendant cgroups) triggered a page faultcgroup v1Sum
container.memory.total_pgmajfaultIndicate the number of times that a process of the cgroup (or descendant cgroups) triggered a major faultcgroup v1Sum
container.memory.total_pgpginNumber of pages read from disk by the cgroup and descendant groupscgroup v1Sum
container.memory.total_pgpgoutNumber of pages written to disk by the cgroup and descendant groupscgroup v1Sum
container.memory.total_rssThe amount of memory that doesn’t correspond to anything on disk: stacks, heaps, and anonymous memory maps. Includes descendant cgroupscgroup v1Sum
container.memory.total_rss_hugeNumber of bytes of anonymous transparent hugepages in this cgroup and descendant cgroupscgroup v1Sum
container.memory.total_unevictableThe amount of memory that cannot be reclaimed. Includes descendant cgroupscgroup v1Sum
container.memory.total_writebackNumber of bytes of file/anon cache that are queued for syncing to disk in this cgroup and descendantscgroup v1Sum
container.memory.unevictableThe amount of memory that cannot be reclaimed.BothSum
container.memory.usage.maxMaximum memory usage.BothSum
container.memory.writebackNumber of bytes of file/anon cache that are queued for syncing to disk in this cgroupcgroup v1Sum
container.network.io.usage.rx_errorsReceived network errorsBothSum
container.network.io.usage.rx_packetsErrors in received packetsBothSum
container.network.io.usage.tx_errorsTransmission errorsBothSum
container.network.io.usage.tx_packetsPackets with transmission errorsBothSum
container.pids.countTotal container PIDsBothSum
container.pids.limitPIDs limitsBothSum
container.restartsNumber of restarts for the container.BothSum
container.uptimeTime elapsed since container start time.BothGauge

Attributes

The attributes collected for all the metrics are as follows:

NameDescriptionValuesEnabled
container.command_lineThe full command executed by the container.Any Strfalse
container.hostnameThe hostname of the container.Any Strtrue
container.idThe ID of the container.Any Strtrue
container.image.idThe ID of the container image.Any Strfalse
container.image.nameThe name of the docker image in use by the container.Any Strtrue
container.nameThe name of the container.Any Strtrue
container.runtimeThe runtime of the container. For this receiver, it will always be 'docker'.Any Strtrue

Conclusion

In this tutorial, you installed an OpenTelemetry collector to collect Docker container metrics and sent the collected data to SigNoz for monitoring and visualization.

Visit our complete guide on OpenTelemetry Collector to learn more about it. OpenTelemetry is quietly becoming the world standard for open-source observability, and by using it, you can have advantages like a single standard for all telemetry signals, no vendor lock-in, etc.

SigNoz is an open-source OpenTelemetry-native APM that can be used as a single backend for all your observability needs.


Further Reading

Complete Guide on OpenTelemetry Collector

An OpenTelemetry-native APM