Skip to main content

Jaeger vs Zipkin - Which tool to choose for tracing?

· 8 min read
Ankit Anand

Distributed tracing is becoming a critical component of any application's performance monitoring stack. However, setting it up in-house is an arduous task, and that's why many companies prefer outside tools. Jaeger and Zipkin are two popular open-source projects used for end-to-end distributed tracing. Let us explore their key differences in this article.

Cover Image

Both Zipkin and Jaeger are popular open-source distributed tracing tools. Zipkin was originally inspired by Google's Dapper and was developed by Twitter. Zipkin is a much older project than Jaeger and was first released as an open-source project in 2012. Jaeger was originally built by teams at Uber and then open-sourced in 2015. It got accepted as a Cloud Native incubation project in 2017 and graduated in 2019.

Before we dive into the differences between Jaeger and Zipkin, let's take a short detour to understand distributed tracing.

What is distributed tracing?

In the world of microservices, a user request travels through hundreds of services before serving a user what they need. To make a business scalable, engineering teams are responsible for particular services with no insight into how the system performs as a whole. And that's where distributed tracing comes into the picture.

Microservices architecture
Microservice architecture of a fictional e-commerce application

Distributed tracing gives you insight into how a particular service is performing as part of the whole in a distributed software system. There are two important concepts involved in distributed tracing: Spans and trace context.

User requests are broken down into spans.

What are spans?
Spans represent a single operation within a trace. Thus, it represents work done by a single service which can be broken down further depending on the use case.

A trace context is passed along when requests travel between services, which tracks a user request across services. You can see how a user request performs across services and identify what exactly needs your attention without manually shifting through multiple dashboards.

Trace context is passed to track user requests across services
A trace context is passed when user requests pass from one service to another

Jaeger and Zipkin: Key components

Jaeger's source code is primarily written in Go, while Zipkin's source code is primarily written in Java. The architecture of Jaeger and Zipkin is somewhat similar. Major components in both architectures include:
  • Instrumentation Libraries
  • Collectors
  • Query Service and web UI
  • Database Storage
Jaeger architecture
Illustration of Jaeger architecture (Source: Jaeger website)

Zipkin architecture
Illustration of Zipkin architecture (Source: Zipkin website)

Instrumentation Libraries

Instrumentation is the process of generating telemetry data(logs, metrics, and traces) from an application code. Both Jaeger and Zipkin provide language-specific instrumentation libraries. Instrumentation enables a service to create spans on incoming requests and to attach context information on outgoing requests.

Key points to note about instrumentation libraries of Jaeger and Zipkin:

  • Jaeger recommends using OpenTelemetry APIs and SDKs for generating traces. Using OpenTelemetry users have the advantage of using a single open source standard for all types of telemetry signals as OpenTelemetry aslo supports logs and metrics.

  • Zipkin provides client libraries in popular languages like C#, Go, Java, Javascript, Ruby, Scala, PHP, etc. There are also a lot of community supported libraries that you can use for instrumenting specific frameworks.

Collectors

Telemetry data collected by the instrumentation libraries are sent to a collector in both Jaeger and Zipkin. Jaeger's collectors validate traces, index them, perform any transformations, and finally stores them. Zipkin collector too validates and indexes the collected trace data for lookups.

Query Service and Web UI

Zipkin provides a JSON API for finding and retrieving traces. Jaeger provides stateless service API endpoints which are typically run behind a load balancer, such as NGINX.

The consumer of the query service is a Web UI in both Jaeger and Zipkin, which is used to visualize trace data by a user.

Jaeger's web UI showing Gantt charts
Jaeger's Web UI showing spans with Gantt charts

Zipkin trace UI
Zipkin's trace UI

Database storage

Both Jaeger and Zipkin provide pluggable storage backends for trace data. Cassandra and Elasticsearch are the primarily supported storage backends by Jaeger.

Zipkin was originally built to store data in Cassandra, but it later started supporting Elasticsearch and MySQL too.

Comparing Jaeger and Zipkin

Jaeger and Zipkin have a lot of similarities in their architecture. Though Zipkin is an older project, Jaeger has a more modern and scalable design.

Summarizing the key differences between Jaeger and Zipkin:

  • Instrumentation libraries
    Jaeger has shifted from its client libraries based on Opentracing APIs to recommended OpenTelemetry APIs and SDKs for application instrumentation. OpenTelemetry is a broader framework for observability that can help you generate different kinds of telemetry signals and correlate them for better contextual insights. Zipkin cal also receive traces from OpenTelemetry instrumented applications. Zipkin's instrumentation libraries are focused on simplicity and ease of integration.

  • Deployment
    Jaeger can be deployed as a single binary where all Jaeger backend components run as a single process or as a scalable distributed system. Zipkin, on the other hand, can only be run as a single binary that includes the collector, storage, query service, and web UI.

  • Integrations
    As Jaeger comes under CNCF along with other projects such as Kubernetes, there are official orchestration templates for running Jaeger with Kubernetes and OpenShift. Zipkin provides three options to build and start an instance of Zipkin: using Java, Docker, or running from the source.

  • Sampling strategis
    Jaeger offers adaptive sampling, which adjusts the sampling rate based on the traffic and error rates, enabling more efficient trace data collection without overwhelming storage with too much data. Zipkin relies on simpler, less flexible sampling strategies, such as per-service sampling, which may not be as efficient in environments with highly variable traffic patterns.

  • Community Support
    Despite being newer, Jaeger has caught up to Zipkin in terms of community support. Zipkin is a standalone project which came into existence before containerization went mainstream. Jaeger, as part of CNCF, is a recognized project in cloud-native architectures.

Both Jaeger and Zipkin are strong contenders when it comes to a distributed tracing tool. But are traces enough to solve all performance issues of a modern distributed application? The answer is no. You also need metrics and a way to correlate metrics with traces with a single dashboard. Most SaaS vendors provide both metrics and traces under a single pane of glass. But the beauty of Jaeger and Zipkin is that they are open-source. What if an open-source solution does both and comes with a great web UI with actionable insights for your engineering teams?

That's where SigNoz comes into the picture.

A better to alternative to Jaeger and Zipkin - SigNoz

SigNoz is a full-stack open-source application performance monitoring and observability tool which can be used in place of Jaeger and Zipkin. It provides advanced distributed tracing capabilities along with metrics under a single dashboard.

SigNoz is built to support OpenTelemetry natively. It also provides a fast OLAP datastore, ClickHouse as the storage backend.

Architecture of SigNoz with OpenTelemetry and ClickHouse
Architecture of SigNoz with ClickHouse as storage backend and OpenTelemetry for code instrumentatiion

SigNoz comes with out of box visualization of things like RED metrics.

SigNoz UI showing the popular RED metrics
SigNoz UI showing application overview metrics like RPS, 50th/90th/99th Percentile latencies, and Error Rate

Some of the things SigNoz can help you track:

  • Application overview metrics like RPS, 50th/90th/99th Percentile latencies, and Error Rate
  • Slowest endpoints in your application
  • See exact request trace to figure out issues in downstream services, slow DB queries, call to 3rd party services like payment gateways, etc
  • Filter traces by service name, operation, latency, error, tags/annotations.
  • Run aggregates on trace data
  • Unified UI for both metrics and traces

You can check out SigNoz's GitHub repo here 👇

SigNoz GitHub repo

We also have an active slack community. Feel free to join in and say hi! 👋

SigNoz Slack community


Jaeger vs ELastic APM
Jaeger vs SigNoz
Jaeger vs Prometheus
Jaeger vs New Relic