SigNoz Cloud - This page is relevant for SigNoz Cloud editions.
Self-Host - This page is relevant for self-hosted SigNoz editions.

Incident Specific Dashboard Spin-Up

The checkout service is throwing errors. Your error rate alert fired and you need visibility into what's happening right now. Instead of manually parsing through all your telemetry data, you want a focused dashboard that shows exactly what's broken.

You open your AI assistant, connected to SigNoz via the MCP server, and ask it to build an incident-specific dashboard.

Prerequisites

Step 1: Create the Incident Dashboard from a Natural Language Prompt

Our checkout service error alert is firing. Create an incident dashboard for the last 1 hour with:

Checkout error rate, latency (p95/p99), and throughput vs last week's baseline
Top error messages and failing traces
Health of key dependencies services
Infrastructure: CPU, memory, pod restarts

Your assistant will analyze the checkout service, identify its dependencies, gather the relevant metrics, and build a comprehensive incident dashboard:

✓ Service confirmation: Checkout service detected
  - Error rate: 75% (27 errors / 36 calls in last hour)
  - Dependencies identified: payment, cart, product-catalog, currency, shipping, email

✓ Root cause identified: Payment service
  - Error rate: 52% (26 errors / 50 calls)
  - Top failing operations:
    - oteldemo.CheckoutService/PlaceOrder: 27 failures
    - oteldemo.PaymentService/Charge: 26 failures
  - Pattern: Nearly identical error counts suggest every checkout that reaches payment fails

✓ Dashboard created: "Checkout Service — Incident Response (Last 1h)"
  - Dashboard ID: 019dad99-26b3-75aa-a76d-9060062cbadb

Dashboard organized into four sections:

Section 1: Golden Signals vs 7-day Baseline
- Error Rate % (current vs same window 7 days ago)
- Throughput (req/s) with baseline overlay
- P95 Latency with baseline comparison
- P99 Latency with baseline comparison

Section 2: Top Errors & Failing Traces
- Table of failing operations grouped by span name
- List of recent error traces with traceID and status messages for one-click drill-down

Section 3: Dependency Health
- Error rate % for each dependency: payment, cart, product-catalog, currency, shipping, email
- P95 latency broken out by each downstream service

Section 4: Infrastructure
- CPU utilization for checkout container (grouped by container.id)
- Memory usage for checkout container (grouped by container.id)
- Note: Pod restart metrics not available (requires k8s.pod.* metrics from k8s-infra collector)

Open the dashboard in SigNoz under Dashboards → "Checkout Service — Incident Response (Last 1h)"

The dashboard is now live and provides a complete incident view.

Final Summary

You now have a fully functional incident dashboard created from just using a simple prompt.

Incident Service Dashboard
Incident Service Dashboard Overview
Incident Service Dashboard Detailed View
Incident Service Dashboard Detailed View

The dashboard clearly shows that payment-service is the likely root cause with elevated errors and high latency.

Under the Hood

During this workflow, the MCP server called these tools:

StepMCP ToolWhat It Did
1signoz_list_servicesVerified the checkout service exists and retrieved initial error rate statistics
1signoz_get_service_top_operationsIdentified checkout service dependencies (payment, cart, product-catalog, currency, shipping, email) and top failing operations
1signoz_aggregate_tracesRetrieved error rates, latency percentiles (p95/p99), throughput metrics, and compared against 7-day baseline
1signoz_create_dashboardCreated the incident dashboard with four sections covering golden signals, errors, dependency health, and infrastructure

If you need help with the steps in this topic, please reach out to us on SigNoz Community Slack.

If you are a SigNoz Cloud user, please use in product chat support located at the bottom right corner of your SigNoz instance or contact us at cloud-support@signoz.io.

Last updated: April 20, 2026

Edit on GitHub

Was this page helpful?

Your response helps us improve this page.

On this page

Is this page helpful?

Your response helps us improve this page.