A Slack message from support:
Customer says their checkout failed with a 500 error. Here is the request ID from the error page:
7a3f2b1c8d4e5f6a9b0c1d2e3f4a5b6c. Can someone look into this?
You paste the ID into your AI assistant, connected to SigNoz via the MCP server, and ask what happened.
Prerequisites
- Connect your AI assistant to SigNoz using the MCP Server guide.
- Make sure your services are instrumented with distributed tracing. See Instrument Your Application if you haven't set this up.
Step 1: Reconstruct the Request Path
Look up trace ID 7a3f2b1c8d4e5f6a9b0c1d2e3f4a5b6c. The customer says checkout failed with a 500 error. What happened?
The full span tree comes back:
Trace 7a3f2b1c...: POST /api/checkout for customer 4291
Request path:
api-gateway /api/checkout (823ms)
-> auth-service /verify-token (12ms) OK
-> checkout-service /process (798ms)
-> inventory-service /reserve (45ms) OK
-> payment-service /charge (680ms) FAILED
-> HTTP POST stripe.com/v1/charges (672ms)
Status: 402, "card_declined"
-> order-service /create (never called, skipped after payment failure)
Root cause: The Stripe charge returned a 402 (card_declined). The
payment-service propagated this as a 500 to the checkout-service,
which bubbled it up to the customer as a generic server error.
This is a customer-side issue (declined card), but the error handling
is masking it. The customer should see "Payment declined" not "Server error."
You can already reply to the support ticket: the customer's card was declined, but the app is showing the wrong error message.
Step 2: Correlate with Logs
The span tree tells you what happened. Logs add the details that spans do not capture.
Show me the logs for this trace, especially what payment-service logged when the charge failed.
The correlated logs surface the exact behavior:
payment-service logs:
14:46:31.442 INFO Processing charge for order ord_8821, amount $149.99
14:46:32.114 WARN Stripe returned decline_code=insufficient_funds
14:46:32.115 ERROR PaymentError: charge failed, raising InternalServerError
The payment service catches the Stripe decline but re-raises it as InternalServerError instead of mapping it to a client-facing 402.
Step 3: Determine Blast Radius
What is the error rate for payment-service /charge over the last 24 hours? Break down the failures by type.
The aggregated view reveals the scope:
payment-service /charge over the last 24 hours:
Total requests: 3,847
Failed: 312 (8.1% error rate)
Breakdown of failures:
- 287 (92%) are card declines (Stripe 402) incorrectly raised as 500
- 18 (6%) are Stripe timeouts (legitimate 5xx)
- 7 (2%) are invalid amount errors
287 customers in the last 24 hours got a generic "Server Error" when their card was simply declined. This is a bug in payment-service error handling, not a one-off.
Refine Your Investigation
- Dig into a specific span: "full attributes on the failed Stripe span"
- Find similar failures: "5 more traces where payment-service returned 500 in the last hour"
- Check the timeline: "when did this error pattern start? correlate with deployments"
- Get customer impact: "how many unique customers hit the 500-masking-402 bug today?"
If your logs include the trace_id field, the assistant can correlate them directly. If trace IDs only appear in the log body text, the assistant falls back to a full-text search. This works, but can be slow and resource-intensive on high-volume log environments. For faster correlation, ensure your instrumentation propagates trace_id as a structured log attribute. See Correlate Traces and Logs for setup instructions.
Under the Hood
Reconstructing a bug from a trace ID typically uses these MCP tools:
| Step | MCP Tool | What It Does |
|---|---|---|
| 1 | signoz_search_traces | Finds the trace by ID |
| 2 | signoz_get_trace_details | Returns the full span tree with all span attributes |
| 3 | signoz_search_logs | Searches for logs correlated by trace ID |
| 4 | signoz_aggregate_traces | Checks error rates to determine if the failure is isolated |
| 5 | signoz_get_service_top_operations | Gets operation-level error rates for the affected service |
Next Steps
- Natural Language Log Exploration - Search and analyze logs without writing queries.
- Latency Spike Explainer - Ask "why is this slow?" and trace the bottleneck.
If you need help with the steps in this topic, please reach out to us on SigNoz Community Slack.
If you are a SigNoz Cloud user, please use in product chat support located at the bottom right corner of your SigNoz instance or contact us at cloud-support@signoz.io.