Debugging errors in distributed systems can be a challenging task, as it involves tracing the flow of operations across numerous microservices. This complexity often leads to difficulties in pinpointing the root cause of performance issues or errors.
OpenTelemetry provides instrumentation libraries in most programming languages for tracing.
Using tracing, you can break down the operations into smaller parts or pieces by identifying what happened, where, when, and how it happened, along with every other relevant information. This structured approach significantly enhances the effectiveness and efficiency of the debugging process.
This article discusses OpenTelemetry, OpenTelemetry spans, and the processes involved in creating and utilizing them.
What is OpenTelemetry?
OpenTelemetry, often abbreviated as Otel, is an open-source observability framework designed to generate, gather, and export telemetry data (logs, metrics, and traces) that can be used to troubleshoot performance issues in applications.
It is incubated under Cloud Native Computing Foundation, the same foundation that incubated Kubernetes.
OpenTelemetry follows a specification-driven development and provides client libraries to instrument applications in most programming languages. Once you have instrumented with OpenTelemetry, you should be able to collect various telemetry signals like logs, metrics, and traces from it.
OpenTelemetry is also vendor-agnostic and contributes to standardization by allowing data to be exported to a wide range of backend systems and observability platforms, for example, SigNoz.
SigNoz is an OpenTelemetry-native APM that you can use to visualize OpenTelemetry data.
What is OpenTelemetry span?
As requests flow through distributed systems, it's important to keep track of how it travels, as this can be useful for monitoring and troubleshooting.
Tracing allows you to track the journey of a request as it moves through different services in a distributed environment. It provides a way to understand the flow of operations across these services, making it easier to pinpoint performance issues or errors.
Tracing is a fundamental aspect of observability. A trace is a collection of spans, providing a high-level view of how a specific request or transaction moves through various services within a distributed environment. Imagine a trace as a comprehensive map that outlines the path a request takes through the system.
An OpenTelemetry span represents a single unit of work within a system. It encapsulates information about a specific operation, including its start time, duration, associated attributes, and any events or errors during its execution.
To illustrate, consider an e-commerce application where customers place orders for products. A trace would represent the entire process of a customer's order, from the moment they click "checkout" to the point of order confirmation.
Within this trace, we have multiple spans, each signifying a crucial step in the order processing. For instance, one span might mark the moment the order was placed, recording when it began, how long it took, and any essential details about the order itself. Another span could denote the payment processing, while another might represent the inventory check. Any noteworthy events or errors, such as a payment failure or a product being out of stock, would be recorded as part of these spans within the trace.
What are Span attributes?
A span attribute is a key-value pair that provides additional context or metadata about a span. These attributes provide more information about the operation being performed within the span. They can be extremely useful for understanding and diagnosing issues in complex distributed systems.
From the illustration of traces and spans, we can liken that to span attributes. In our e-commerce application scenario, when a customer places an order, there are various details that can be associated with the processing steps. These details are captured as span attributes in the system.
For example:
- Order ID:
- Key:
order_id
- Value:
12345
- This attribute helps uniquely identify the specific order being processed.
- Key:
- Payment Method:
- Key:
payment_method
- Value:
Credit Card
- This attribute indicates the payment method chosen by the customer.
- Key:
- Inventory Status:
- Key:
inventory_status
- Value:
In Stock
- This attribute informs whether the product is currently available in the inventory.
- Key:
The above can also be written as:
- Key: "Order ID"
- Value: "12345"
- Key: "Payment Method"
- Value: "Credit Card"
- Key: "Inventory Status"
- Value: "In Stock"
Knowing the order ID, payment method, and inventory status associated with each step of an order helps in precise identification and tracking. Later, when you're examining traces, these attributes become invaluable. You can use them to filter and search for specific orders based on their unique IDs and payment methods or even check the availability of items in the inventory.
Span creation
In this section, we will look at how spans are created and how to get the current span as well as nested spans.
To create a span, it's essential first to create a trace, as spans are inherently dependent on traces for their foundation.
Traces are created through a process known as instrumentation. In software development and observability, instrumentation involves adding code or hooks to an application. This allows for the collection of data about its performance, usage, and other runtime characteristics.
There are two approaches to incorporating tracing with OpenTelemetry: manual and automatic.
Manual Instrumentation:
In this approach, developers explicitly control creating and managing spans in their code. They determine when spans begin and end, as well as what information is added to them.
Automatic Instrumentation:
Here, a library or agent is used to trace specific frameworks, libraries, or services automatically. This process occurs without the need for manual intervention. The agents or libraries integrate seamlessly into your application's code, capturing spans effortlessly.
We will be looking at how to create a trace using the manual instrumentation technique using Golang. More information can be found in the documentation and the implementation for other languages.
Creation of Traces
Step 1. Firstly, the OpenTelemetry packages need to be installed:
go get go.opentelemetry.io/otel \
go.opentelemetry.io/otel/trace \
go.opentelemetry.io/otel/sdk \
What each line means:
go get go.opentelemetry.io/otel
: This line fetches and installs the main OpenTelemetry package. It is the entry point for working with OpenTelemetry in your Go application.go.opentelemetry.io/otel/trace
: This line fetches and installs the OpenTelemetry package related to distributed tracing. It includes functionality for creating and managing traces, spans, and exporting trace data.go.opentelemetry.io/otel/sdk
: This line fetches and installs the OpenTelemetry software development kit (SDK). The SDK provides the core implementation for OpenTelemetry and manages traces, spans, and other telemetry data.
Step 2. To start tracing your application, you'll need to initialize an exporter of your choice, resources, a tracer provider, and finally, a tracer. This process involves setting up the necessary components for your application to collect and transmit trace data to your preferred backend.
package app
import (
"context"
"fmt"
"log"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
"go.opentelemetry.io/otel/trace"
)
var tracer trace.Tracer
func newExporter(ctx context.Context) /* (someExporter.Exporter, error) */ {
// Your preferred exporter: console, jaeger, zipkin, OTLP, etc.
}
func newTraceProvider(exp sdktrace.SpanExporter) *sdktrace.TracerProvider {
// Ensure default SDK resources and the required service name are set.
r, err := resource.Merge(
resource.Default(),
resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName("ExampleService"),
),
)
if err != nil {
panic(err)
}
return sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exp),
sdktrace.WithResource(r),
)
}
func main() {
ctx := context.Background()
exp, err := newExporter(ctx)
if err != nil {
log.Fatalf("failed to initialize exporter: %v", err)
}
// Create a new tracer provider with a batch span processor and the given exporter.
tp := newTraceProvider(exp)
// Handle shutdown properly so nothing leaks.
defer func() { _ = tp.Shutdown(ctx) }()
otel.SetTracerProvider(tp)
// Finally, set the tracer that can be used for this package.
tracer = tp.Tracer("ExampleService")
}
Once the tracer has been set up, you can access it and manually instrument your code.
Creation of Spans
With the tracer in place, spans can be created to track specific operations within the code.
The creation of spans with tracers requires access to a context.Context instance
. A context.Context
instance is a way to carry request-scoped values across API boundaries and between processes. Usually, these instances are obtained from objects such as a request, and they may already have a parent span from an instrumentation library.
func httpHandler(w http.ResponseWriter, r *http.Request) {
ctx, span := tracer.Start(r.Context(), "hello-span")
defer span.End()
// do some work to track with hello-span
}
The above function sets up an OpenTelemetry span named "hello-span" and associates it with an incoming HTTP request. The defer
statement ensures that the span is ended correctly when the function completes. This is useful for measuring the duration of the operations within the span and visualizing the flow of requests in a distributed system.
In Go, the context package is used to manage the active span. When starting a span, the context that contains it is modified, and a handle for both the span and the modified context is obtained.
After a span has been completed, it becomes immutable, and any attempt to modify it will fail.
Getting current span
To obtain the current span, you must extract it from a context.Context
instance that you have a handle on.
// This context needs contain the active span you plan to extract.
ctx := context.TODO()
span := trace.SpanFromContext(ctx)
// Do something with the current span, optionally calling `span.End()` if you want it to end
Nested span
Nested spans can be used to track work in a nested operation. If the context.Context
instance you have a handle on already contains a span, creating a new span will result in a nested span.
This can be useful when you want to track the performance of a specific operation within the context of a larger operation. To create a nested span, initiate a new span within the existing span's context.
For example:
func parentFunction(ctx context.Context) {
ctx, parentSpan := tracer.Start(ctx, "parent")
defer parentSpan.End()
// call the child function and start a nested span in there
childFunction(ctx)
// do more work - when this function ends, parentSpan will complete.
}
func childFunction(ctx context.Context) {
// Create a span to track `childFunction()` - this is a nested span whose parent is `parentSpan`
ctx, childSpan := tracer.Start(ctx, "child")
defer childSpan.End()
// do work here, when this function returns, childSpan will complete.
}
Remember that once a span has been completed, it becomes immutable, and any further modifications to it are not possible. This ensures that the captured data and attributes of the span remain intact and unaltered.
Getting started with OpenTelemetry tracing
If you’re looking for the right distributed tracing tool that supports OpenTelemetry, then SigNoz is the right choice. SigNoz is an open-source distributed tracing tool that supports OpenTelemetry natively. It also provides metrics monitoring and logs management under a single pane of glass.
One of the key strengths of SigNoz is its native support for OpenTelemetry, which is rapidly emerging as the global standard for application instrumentation. By adopting OpenTelemetry, users can avoid vendor lock-in and gain access to a set of convenient client libraries that streamline the implementation of distributed tracing.
With SigNoz's support for OpenTelemetry, users can easily integrate their applications with SigNoz's observability platform, enabling them to gain deeper insights into their applications and improve their overall performance.
One of the standout features of SigNoz is its intuitive visualization capabilities. It enables users to generate insightful visual representations like flamegraphs and Gantt charts based on the tracing data collected through OpenTelemetry.
These visualizations provide valuable insights into the performance and behavior of applications, making troubleshooting and performance optimization significantly more efficient.
Getting started with SigNoz
SigNoz cloud is the easiest way to run SigNoz. Sign up for a free account and get 30 days of unlimited access to all features.
You can also install and self-host SigNoz yourself since it is open-source. With 19,000+ GitHub stars, open-source SigNoz is loved by developers. Find the instructions to self-host SigNoz.
Related Posts
Spans - a key concept of distributed tracing
An Open Source Observability Platform