Out-of-box OpenTelemetry-powered Kafka & Celery monitoring

Messaging queues power modern distributed systems, handling background tasks, event-driven architectures, and real-time data streaming. However, debugging issues in Kafka and Celery queues has traditionally been a black box, with limited correlation between message producers, consumers, and broker metrics.

*OpenTelemetry-powered messaging queue monitoring in SigNoz*

With OpenTelemetry-powered Kafka & Celery monitoring, SigNoz introduces the industry's first fully integrated observability solution for messaging queues powered by OpenTelemetry.

Now, teams can correlate Kafka broker metrics with OpenTelemetry spans, enabling deep insights into consumer lag, throughput, drop rates, and performance bottlenecks.

Watch Demo

Check out this video to see our messaging queue feature in action.

Why Observability for Messaging Queues Matters

In modern microservices architectures, message queues like Kafka and Celery play a crucial role in ensuring event-driven processing, background task execution, and scalable system communication. However, debugging performance issues in these queues has been notoriously difficult. Many teams struggle to diagnose consumer lag spikes, processing delays, and ineffective autoscaling strategies due to a lack of correlation between Kafka broker metrics and application traces.

SigNoz solves this challenge by providing an end-to-end correlation between Kafka broker metrics, producer spans, and consumer spans. With real-time consumer lag monitoring, message drop rate analysis, and detailed Celery task observability, teams can now gain actionable insights into their messaging systems without manually setting up dashboards. Best of all, it works out of the box with OpenTelemetry instrumentation.

How SigNoz Enables Kafka & Celery Observability

Messaging Queue Overview

The new Messaging Queues tab in SigNoz provides a comprehensive overview of all Kafka and Celery queues instrumented with OpenTelemetry. Users can:

See all producers, consumers, and their performance metrics in one place.
Filter by service name, producer, consumer, or messaging system (Kafka, Celery).
View key error rates, request rates, and latency metrics for each queue.

Consumer Group Lag Insights

Consumer lag is one of the biggest challenges when working with Kafka-based systems. With SigNoz’s Consumer Group Lag View, teams can analyze real-time data to pinpoint which producer and consumer services contribute to lag spikes.

By diving deeper into throughput, error rates, and P99 latencies, users can detect whether consumer lag results from increased producer load or inefficient consumer scaling, enabling faster troubleshooting and better resource allocation.

Partition-Level Metrics & Producer Latency Insights

For each Kafka topic and partition, SigNoz provides:

Partition-level throughput and latency metrics to optimize performance.
Per-service visibility into Kafka message rates.
Real-time monitoring of consumer lag per partition to detect bottlenecks.

Drop Rate Analysis: Detecting Slow Messages

SigNoz introduces a Drop Rate View to help teams identify messages that take longer than expected to process. Users can set evaluation intervals, such as identifying messages that exceed a 10ms processing time, and directly navigate to traces to investigate the root cause. This feature makes it easier to detect inefficiencies in message processing and fine-tune system performance.

Celery Monitoring: Visibility into Task Execution

For Celery-based background jobs, SigNoz offers detailed task execution tracking. Teams can monitor active worker performance, task success and failure rates, and P99/P95 task latencies.

Engineers can drill down into individual task execution times and identify performance bottlenecks with a few clicks, ensuring smooth and efficient queue management.

OpenTelemetry-Powered Correlation

Unlike traditional monitoring tools that provide only Kafka broker metrics, SigNoz leverages OpenTelemetry to correlate Kafka broker metrics with producer and consumer spans.

This enables users to understand the relationship between different system components, pinpoint which services are responsible for performance issues, and gain insights that would otherwise be difficult to derive.

Real-World Use Cases

1. Event-Driven Systems: Scaling Consumer Services

📌 Use case: A ride-hailing platform (e.g., Uber) processes thousands of booking requests per second. These events are sent to Kafka, and consumer services process them to assign drivers. ✅ SigNoz helps by pinpointing consumer lag spikes and recommending auto-scaling adjustments for consumer services.

2. Background Job Processing

📌 Use case: A machine learning pipeline processes large batches of data using Celery task queues. Some tasks take significantly longer than others. ✅ With SigNoz, users can detect slow-running tasks and optimize worker distribution.

3. Financial Transaction Processing

📌 Use case: A trading platform updates stock prices in real time using Kafka topics for price updates. ✅ SigNoz correlates Kafka broker latency with application spans, ensuring trades are executed instantly without bottlenecks.

What’s Next? Upcoming Enhancements

This is just the beginning! We are expanding support for more messaging queues and features:

Support for AWS SQS and additional Kafka integrations.
Enhanced .NET support for Kafka-based applications.
More granular analytics for Kafka partition-level performance.

Unlock Messaging Queue Observability with SigNoz

SigNoz is redefining messaging queue monitoring with OpenTelemetry. Whether you are using Kafka for real-time event streaming or Celery for background job processing, this feature provides deep visibility, real-time correlation, and powerful debugging capabilities.

For setup instructions and more details, check out our documentation.

Have feedback? Join our Slack Community and let us know what you think!

Launch Week 3.0

Check out all updates of Launch Week 3.0.