Before using this dashboard, instrument your Hermes agent with OpenTelemetry and configure export to SigNoz. See the Hermes monitoring guide for complete setup instructions.
This dashboard offers a clear view into Hermes coding agent behavior and performance. It highlights key metrics such as agent turn volume, LLM API call patterns, token consumption, tool-call activity, and error trends. Teams can track end-to-end turn latency, per-model token costs, and individual failing spans to keep their agents fast and reliable.
Dashboard Preview

Dashboards → + New dashboard → Import JSON
What This Dashboard Monitors
This dashboard tracks critical performance metrics for your Hermes coding agent using OpenTelemetry traces (service: hermes-agent) to help you:
- Monitor Agent Activity: Track agent turn and LLM turn counts, total tool calls, and overall API call volume to understand how actively the agent is working across sessions.
- Analyze Token Consumption: Observe input, output, and cache-read token usage over time and per model to understand costs, spot consumption spikes, and optimize prompting strategies.
- Track Model Usage: See which LLM models are being called, how tokens are distributed across them, and how finish reasons break down to measure model health and behavior.
- Ensure Responsiveness: Monitor end-to-end agent turn latency and LLM API call latency at p50, p95, and p99 to surface slowdowns and maintain a consistent coding experience.
- Understand Tool Behavior: Measure which tools are called most often, how long each tool takes, and whether tool calls succeed or error — including a summary table with call counts and p95 latency per tool.
- Investigate Errors: Track error spans over time by operation, view a ranked table of the most-failing operations, and drill into individual failing spans with status messages for root-cause analysis.
Metrics Included
Overview Scorecards
- Agent Turns: Count of root
agentspans in the selected time range, representing the total number of agent turns or sessions processed. - LLM Turns: Count of
llm.*wrapper spans, showing how many LLM interaction cycles the agent performed. - LLM API Calls: Count of spans where
llm.model_nameexists, representing individual chat completion calls made to the model provider. - Tool Calls: Count of
tool.*spans, showing the total number of tool invocations across all agent turns. - Total Tokens: Sum of
gen_ai.usage.total_tokensacross all spans, giving the aggregate token consumption for the selected range. - Error Spans: Count of spans where
hasError = true, with a red threshold triggered by any non-zero value for immediate attention.
LLM & Model Metrics
- LLM API Calls by Model: Pie chart breaking down chat completion call counts by
llm.model_name, helping you understand which models are called most frequently and track adoption across model versions. - Token Usage Over Time: Time series showing input tokens, output tokens, and cache-read tokens stacked over time, revealing consumption trends and the benefit of prompt caching.
- Total Tokens by Model: Pie chart showing total token consumption split by model, useful for understanding which model drives the most cost.
- LLM API Call Latency (p50 / p95 / p99): Duration percentiles for chat completion spans over time, surfacing model response time trends and latency regressions.
- Cost Proxy: Input vs Output Tokens by Model: Line chart plotting input and output token volume per model over time as a cost proxy, since no native cost attribute is available — scale by your per-model pricing to estimate spend.
- Responses by Finish Reason: Pie chart of
llm.response.finish_reasonvalues (e.g.stop,tool_calls,length) to reveal how often the model terminates normally versus hitting limits or requesting tool use.
Agent & Turn Metrics
- Agent Turns Over Time: Time series of root
agentspan counts, showing turn volume trends and helping identify peak activity windows or unexpected drops. - Agent Turn Duration (p50 / p95): End-to-end duration percentiles for
agentspans, measuring how long complete agent turns take from start to finish. - Avg API Calls per Turn: Average of
hermes.turn.api_call_countper agent span over time, showing how many model round-trips a typical turn requires. - Avg Tools per Turn: Average of
hermes.turn.tool_countper agent span, indicating how tool-heavy the agent's reasoning is on a typical turn. - Turn Final Status: Pie chart of
hermes.turn.final_statusvalues, showing the distribution of how agent turns complete (e.g. success, error, timeout). - Sessions by Kind: Pie chart of
hermes.session.kindvalues, breaking down sessions by their interaction type or mode.
Tool Call Metrics
- Tool Calls by Type: Pie chart of
tool.*span counts grouped by operation name, showing which tool types the agent invokes most. - Tool Call Latency (p95) by Type: Line chart of p95 duration per tool over time, identifying which tools are the slowest and most likely to bottleneck agent turns.
- Tool Outcomes (completed vs error): Pie chart of
hermes.tool.outcomevalues, showing the ratio of successful versus failed tool executions. - GenAI Tool Invocations by Name: Pie chart of tool call counts grouped by
tool.name(model-requested tools), revealing which tools the model chooses most during its reasoning loop. - Tool Usage Summary: Table showing each tool type with its total call count (sorted descending) and p95 latency, giving a quick reference for the most-used and slowest tools.
Error Monitoring
- Errors Over Time: Time series of
hasError = truespans grouped by span name, letting you see which operations are failing and when failure spikes occur. - Error Count by Operation: Table of error counts per operation name sorted descending, identifying the most-failing span types at a glance.
- Recent Error Spans: List of the 25 most recent errored spans sorted by timestamp, showing the span name, status message,
hermes.tool.outcome, and duration — use this to drill into individual failures and find root causes.