NVIDIA GPU metrics with DCGM Exporter

SigNoz Cloud - This page applies to SigNoz Cloud editions.
Self-Host - This page applies to self-hosted SigNoz editions.

This document explains how to monitor NVIDIA GPUs using the DCGM Exporter and SigNoz. The exporter collects GPU metrics and exposes them for Prometheus-style scraping.

Prerequisites

  • NVIDIA GPU(s) with drivers installed
  • NVIDIA Container Toolkit (for Docker deployments)

Setup

Step 1: Run NVIDIA DCGM Exporter

NVIDIA's official dcgm-exporter exposes GPU metrics on :9400/metrics.

Docker (single node quickstart):

docker run -d \
  --gpus all \
  --cap-add SYS_ADMIN \
  --rm \
  -p 9400:9400 \
  nvcr.io/nvidia/k8s/dcgm-exporter:4.4.2-4.7.1-ubuntu22.04

Verify it's running:

curl localhost:9400/metrics | head

You should see metrics like DCGM_FI_DEV_SM_CLOCK, DCGM_FI_DEV_MEM_CLOCK, etc.

Kubernetes: For Kubernetes deployments, NVIDIA recommends installing via the Helm chart.

Step 2: Setup OTel Collector

Refer to this documentation to set up the collector.

Step 3: Configure the Prometheus Receiver

Add a scrape job for the DCGM exporter in your OTel Collector config:

config.yaml
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: "dcgm-exporter"
          scrape_interval: 15s
          scrape_timeout: 10s
          static_configs:
            - targets: ["<gpu-node-host>:9400"]

Configuration parameters:

  • <gpu-node-host>: Hostname or IP of the node running the DCGM exporter

If the OTel Collector runs on the same host as the exporter (non-containerized), use localhost:9400 as the target. In containerized environments, use the container name or service name instead.

Step 4: Enable the Pipeline

Add the receiver to your metrics pipeline:

config.yaml
service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [otlp]

Visualizing GPU Metrics

Once configured, verify ingestion in the Metrics Explorer. Search for metrics starting with DCGM_.

You can use the pre-configured NVIDIA DCGM dashboard to monitor your GPUs:

Dashboards → + New dashboard → Import JSON

Troubleshooting

Common Issues

  1. No metrics appearing in SigNoz

    • Verify the DCGM exporter is running and /metrics endpoint is accessible
    • Ensure NVIDIA drivers are properly installed
  2. Container fails to start

    • Verify NVIDIA Container Toolkit is installed
    • Check if GPUs are visible with nvidia-smi
    • Ensure --gpus all flag is passed to Docker

Last updated: March 03, 2026

Edit on GitHub

Was this page helpful?

Your response helps us improve this page.