Overview
Tail sampling evaluates a complete trace before deciding whether to keep or drop it. Unlike head sampling (which decides at the start of a request), tail sampling waits for all spans to arrive at the collector, then applies policies against the full trace.
Use tail sampling when your sampling decision depends on trace-level data (error status, total duration, a specific span name) that isn't available until the trace completes.
SigNoz computes APM metrics (signoz_calls_total, signoz_latency_sum, and related RED metrics) from ingested trace data. When you enable tail-based sampling, those metrics cover only the sampled traces, so absolute values like total request counts undercount real traffic. Latency trends and error spikes stay reliable. SigNoz is aware of this gap and plans to address it in a future release.
Prerequisites
- OpenTelemetry Collector Contrib installed and running
- Familiarity with OTel Collector configuration
The tail sampling processor ships only in otelcol-contrib, not the core OpenTelemetry Collector distribution.
How it works
The processor buffers incoming spans in memory, grouped by trace ID. When decision_wait expires, it evaluates the buffered trace against your policies. A trace is sampled if any policy returns a sample decision. drop policies override sample decisions.
Setup
Add tail_sampling to the processors section of your collector config and wire it into the traces pipeline. See the official processor docs for the full config reference.
processors:
tail_sampling:
decision_wait: 10s
num_traces: 50000
policies:
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
service:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling, batch]
exporters: [otlp]
Place tail_sampling after any context-dependent processors such as k8sattributes, and just before batch. The processor reassembles spans into new batches, discarding the original context. Putting batch after tail_sampling ensures only sampled spans are batched for export.
When running multiple collector instances, all spans for a trace ID must arrive at the same instance. Use the load balancing exporter in a first collector layer to route by trace ID, then apply tail sampling in a second layer.
Configuration reference
| Option | Default | Description |
|---|---|---|
decision_wait | 30s | Wait time before evaluating a trace |
decision_wait_after_root_received | 0s | Additional wait after root span arrives; 0s disables root-span acceleration |
num_traces | 50000 | Maximum traces buffered in memory |
expected_new_traces_per_sec | 0 | Hint for pre-allocating memory |
sample_on_first_match | false | Short-circuit policy evaluation as soon as any policy matches, without evaluating remaining policies |
maximum_trace_size_bytes | 0 | Drop traces exceeding this size (bytes); 0 disables the limit |
Policies
Each policy defines a condition for keeping a trace. Policies are OR-ed: a trace is sampled if any policy matches. A drop policy overrides any sample decision.
Place drop policies first. With sample_on_first_match: true, the processor short-circuits evaluation as soon as a policy matches. Putting drop policies first means the processor drops noisy traces (health checks, probes) before running the remaining policies:
- Drop — health checks, probes, known-noise spans
- Keep — errors and slow traces
- Probabilistic fallback — sample remaining traffic at a fixed rate
- Service-specific overrides — different rates per service if needed
Drop health-check and probe traces
The drop policy prevents sampling when any sub-policy matches. Use it to exclude noisy low-value traces by URL path.
processors:
tail_sampling:
decision_wait: 10s
num_traces: 50000
policies:
- name: drop-health-probes
type: drop
drop:
drop_sub_policy:
- name: match-probe-paths
type: string_attribute
string_attribute:
key: url.path
values: [/health, /ready, /live, /metrics]
To match with a single regex instead of exact values (uses RE2 syntax):
- name: match-probe-paths-regex
type: string_attribute
string_attribute:
key: url.path
values: ['^/(health|ready|live|metrics)$']
enabled_regex_matching: true
Drop traces by span name
The string_attribute policy matches span and resource attributes, not the span name itself. To match on span name, use the ottl_condition policy with span.name.
Drop all traces that contain a span named health-check:
processors:
tail_sampling:
decision_wait: 10s
num_traces: 50000
policies:
- name: drop-by-span-name
type: drop
drop:
drop_sub_policy:
- name: match-span-name
type: ottl_condition
ottl_condition:
error_mode: ignore
span:
- 'span.name == "health-check"'
Use regex to match a pattern across multiple span names (see OTTL docs for available functions):
- name: drop-probe-spans-by-name
type: drop
drop:
drop_sub_policy:
- name: match-span-name-regex
type: ottl_condition
ottl_condition:
error_mode: ignore
span:
- 'IsMatch(span.name, "^(health|ready|live).*")'
To drop individual spans (not the entire trace) by span name, use the filter processor instead. Tail sampling always operates on the complete trace.
Keep only error traces
Sample traces containing at least one error span. Traces without errors are implicitly not sampled.
processors:
tail_sampling:
decision_wait: 10s
num_traces: 50000
policies:
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
Keep slow traces
Sample traces where total duration exceeds a threshold.
processors:
tail_sampling:
decision_wait: 10s
num_traces: 50000
policies:
- name: keep-slow-traces
type: latency
latency:
threshold_ms: 1000
Set upper_threshold_ms to keep only traces within a duration band:
- name: keep-medium-traces
type: latency
latency:
threshold_ms: 500
upper_threshold_ms: 5000
Keep errors and slow traces, sample the rest
Always keep errors and slow traces, then sample 5% of everything else as a baseline.
processors:
tail_sampling:
decision_wait: 10s
num_traces: 50000
policies:
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
- name: keep-slow-traces
type: latency
latency:
threshold_ms: 1000
- name: sample-rest
type: probabilistic
probabilistic:
sampling_percentage: 5
Sample specific services differently
Use and to combine policies: all sub-policies must match for the trace to be sampled. This example samples 1% of health-probe traces from api-gateway and 100% of error traces from any service.
processors:
tail_sampling:
decision_wait: 10s
num_traces: 50000
policies:
# Policy 1: keep 100% of error traces from any service
- name: keep-all-errors
type: status_code
status_code:
status_codes: [ERROR]
# Policy 2: AND — all three sub-policies must match
- name: low-sample-probes
type: and
and:
and_sub_policy:
# Sub-policy A: only traces from api-gateway
- name: match-service
type: string_attribute
string_attribute:
key: service.name
values: [api-gateway]
# Sub-policy B: only health/ready probe routes
- name: match-probe-route
type: string_attribute
string_attribute:
key: http.route
values: [/health, /ready]
# Sub-policy C: keep 1% of matching traces
- name: probabilistic-1pct
type: probabilistic
probabilistic:
sampling_percentage: 1
Validate
After restarting the collector, confirm sampling is working:
- Trigger traces from your services, including both success and error scenarios.
- Open Traces > Explorer in SigNoz and verify only expected traces appear.
- Check the collector's
otelcol_processor_tail_sampling_count_traces_sampledmetric to confirm decisions are being made. Query by label:
- Sampled traces:
otelcol_processor_tail_sampling_count_traces_sampled{sampled="true"} - Not sampled:
otelcol_processor_tail_sampling_count_traces_sampled{sampled="false"}
Limitations
Memory pressure
The processor holds all spans for each trace in memory until a decision is made. If trace volume exceeds num_traces, the oldest traces are evicted before evaluation — they are dropped without sampling.
Monitor otelcol_processor_tail_sampling_sampling_trace_dropped_too_early for early drops. If this metric rises, increase num_traces or reduce decision_wait. Both changes increase memory usage. Set maximum_trace_size_bytes to drop oversized traces before they exhaust the buffer.
Late-arriving spans
A span arriving after its trace's sampling decision is made inherits the existing decision. If the decision has already been evicted from memory and no decision cache is configured, the late span triggers a new evaluation cycle — which can produce a different decision for that span.
Configure decision_cache (sampled_cache_size and non_sampled_cache_size) to persist decisions beyond the in-memory buffer. Set cache sizes well above num_traces to reduce the chance of a late span missing its cached decision.
Shutdown behavior
Pending traces are evaluated with partial data on shutdown, which can produce incomplete sampling decisions. Set drop_pending_traces_on_shutdown: true to discard incomplete traces instead.
Collector scaling
All spans for a single trace must arrive at the same collector instance. Without trace-ID-aware routing across instances, sampling decisions will be based on incomplete data. See the load balancing exporter note in the Setup section.
Troubleshooting
Traces missing after enabling tail sampling
- Verify the processor is in the traces pipeline, not logs or metrics.
- Check
decision_wait— if too short, spans may not have arrived before evaluation. Start with10sand increase if traces are long-running. - Confirm
otelcol-contribis running, not the core distribution.
High memory usage on the collector
- Reduce
num_tracesordecision_waitto lower the in-memory buffer. - Enable
sample_on_first_match: trueto decide early when a policy matches. - Set
maximum_trace_size_bytesto drop oversized traces before they exhaust memory.
Sampling decisions seem inconsistent
- Check for late-arriving spans — compare span arrival times against
decision_wait. - If running multiple collectors, verify trace-ID routing is working (load balancing exporter).
- Review your policies —
dropoverrides anysampledecision regardless of policy order. If adroppolicy matches, the trace is dropped even if another policy would have sampled it.
Next Steps
- Control Traces Volume: drop individual spans, attributes, or use the filter processor alongside tail sampling
- PII Scrubbing in Traces: remove sensitive attribute values before traces reach SigNoz
Get Help
If you need help with the steps in this topic, please reach out to us on SigNoz Community Slack.
If you are a SigNoz Cloud user, please use in product chat support located at the bottom right corner of your SigNoz instance or contact us at cloud-support@signoz.io.