OTel-Native by Design - Building Backends That Export to Any Observability Stack

Updated Mar 13, 202616 min read

If you’re building a backend or SaaS product, your users will eventually ask to send logs, traces, and metrics to their own observability stack, whether for compliance, cost, or to keep everything in one place.

Locking them into your built-in dashboards or limiting exports to certain vendors creates unnecessary friction. Rather, supporting export to any OpenTelemetry (OTel)–compatible backend is a vendor-neutral, future-proof practice, that gives users the flexibility of choice.

This post outlines how to design your backend so users can export full telemetry (logs, traces, and metrics) to an OTel backend when they want to.

Enabling telemetry export via the OTLP standard allows users to own and analyze their data on the platforms of their choosing.
Enabling telemetry export via the OTLP standard allows users to own and analyze their data on the platforms of their choosing.

Enabling telemetry export via the OTLP standard allows users to own and analyze their data on the platforms of their choosing.

The Three Observability Signals

OpenTelemetry defines three main signal types, all carried over the same OTLP protocol:

  • Logs: Event records, request/access logs, and application logs with timestamps and metadata.
  • Traces: Distributed traces and spans so users can see request flows across services and correlate them with logs.
  • Metrics: Counters, gauges, and histograms (e.g. request rates, latency, error rates).

The same export story applies to all three: let users configure an OTLP endpoint and push telemetry to it. You can support one, two, or all three signals depending on what your product generates.

Many platforms that support OTel export support at least traces and logs; an increasing number support metrics as well. Designing for all three from the start avoids having to retrofit later.

What a “Good” Telemetry System Looks Like

A solid export story has a few clear properties for every signal you support:

  • Vendor-neutral and OpenTelemetry-compliant: Users can point at any OTel-compatible endpoint (SigNoz, Grafana, Honeycomb, Dynatrace, OpenTelemetry Collector instance, etc.) without you building custom integrations for each.
  • No deep custom development: External platforms (or your users’ tooling) can integrate using standard OTel SDKs and the OTLP protocol instead of proprietary APIs.
  • Rich context preserved: Exported data should include metadata, timestamps, and trace/span correlation where available (e.g. log records linked to trace IDs), so users can debug and analyze data in their own backend without losing context.

If your design aligns with these principles for the signals you emit, you’re aligned with how modern platforms think about observability export.

Two Contexts: Where Does Your Product Run?

The way you add OTel export depends on who owns the system that produces the telemetry. Getting this straight helps you choose the right approach.

Self-Hosted Software

Your product is an application or system (e.g., an identity server, a service mesh, a database) that customers install and run in their environment (their data center, their cloud, their Kubernetes cluster).

Here, you instrument your product with OpenTelemetry. When the customer configures an endpoint (e.g. via a startup flag or config file), your application exports telemetry from the process they’re running.

The export happens in the customer’s environment; they control the binary and the destination. Examples: Keycloak, Kuma.

Cloud Platforms

Your product is a platform where customers deploy their own apps or use your managed services (e.g. PaaS, serverless, API gateway). The workload runs on your infrastructure.

Here, you add a platform feature, such as “Telemetry Drains” or “Observability Destinations”, that lets customers configure where to send telemetry.
Your platform collects telemetry from their workload (and from your own services, like routers) and forwards it to the customer’s OTLP endpoint.

The export is done by your infrastructure, not by an application binary the customer runs. Examples: Heroku, Cloudflare.

In short, user-deployed software → focus on built-in instrumentation and an endpoint config. A platform you operate → focus on configurable export destinations that your infrastructure uses to forward data.

How Others Do It

The OpenTelemetry Ecosystem Registry is a good place to see which projects and organizations support OTel, and in what manner.
Below is how several companies handle all three signals (or a subset) and what you can learn from them.

The Summary: Signals Support Matrix

PlatformLogsTracesMetricsDeployment ModeNotes
KumaYesYesYesSoftware users deploySeparate policies per signal, all OTel
KeycloakYes†YesYesSoftware users deploy†Logs in preview; same endpoint for all
Cloudflare WorkersYesYesNo*Platform*Metrics export not yet supported
HerokuYesYesYesPlatformUser chooses signals via --signals

The Self-Hosted Approach: Kuma & Keycloak

If your users deploy your software into their own environments, the best practice is to ship the application pre-instrumented with OpenTelemetry and expose configuration flags for their OTLP endpoints.

Kuma

Whether customers run Kuma’s control and data planes on their own Kubernetes clusters or VMs, it comes pre-configured to emit traces, metrics, and logs to an OTel backend.
Users configure the export, which runs from the users’ Kuma deployments, through the mesh policies:

  • MeshAccessLog: Routes access logs to an OTel collector (endpoint + attributes such as mesh name, start time).
  • MeshTrace: Handles distributed traces with configurable sampling and tagging.
  • MeshMetric: Exposes control and data plane metrics. Integrates with OpenTelemetry and Prometheus.

For example, sending traces to an OTel backend looks like this:

# MeshTrace policy
backends:
-type: OpenTelemetry
openTelemetry:
endpoint: otel-collector:4317

Sending access logs follows the exact same pattern with a different policy:

# MeshAccessLog policy
backends:
-type: OpenTelemetry
openTelemetry:
endpoint: otel-collector:4317
body:
kvlistValue:
values:
-key:"mesh"
value:
stringValue:"%KUMA_MESH%"
attributes:
-key:"start_time"
value:"%START_TIME%"

Further reading:

Keycloak

Keycloak is another example of self-hosted software providing great telemetry export functionality.

Instead of requiring a separate sidecar or platform feature, users just pass a startup flag pointing to their Collector endpoint, and the Keycloak process itself handles the export.

It uses a single telemetry endpoint but provides granular flags to toggle specific signals:

  • Traces: tracing-enabled=true (covers HTTP requests, DB, LDAP, outbound HTTP/IdP).
  • Metrics: Detailed metrics exposed via the same OTel integration.
  • Logs: Currently in preview, disabled by default (features=opentelemetry-logs --telemetry-logs-enabled=true, with -telemetry-logs-level for level filtering.

Defining the endpoint, optional headers, and preferred protocol (gRPC or HTTP) looks like:

bin/kc.sh start --telemetry-endpoint=http://my-otel-endpoint:4317 --telemetry-protocol=grpc

Further reading:

The recurring theme between either deployment modes is clear: push-based export using OTel/OTLP is the gold standard. Some products expose one endpoint for all three signals (Keycloak—typical OTLP setup), whereas others let users pick which signals to send (Heroku’s --signals).

Natively supporting all telemetry signals gives your users the ultimate flexibility to build a complete picture in their backend of choice.

The Platform Approach: Cloudflare and Heroku

When you control the infrastructure, the cleanest user experience is to handle the export at the platform level, pulling data from the user’s workload and pushing it to their destination.

Cloudflare Workers

Because users run their code directly on Cloudflare’s infrastructure, it does the “heavy lifting” through the Observability Destinations platform feature. It offers users a simple plug-and-play design where users configure an OTLP endpoint into their dashboard.
From there, Cloudflare automatically pushes traces and logs from Workers straight to that destination.

While metrics aren’t supported yet, the trace data provides deep, end-to-end visibility as it records handler calls, bindings, outbound fetch and more. Users can also configure the sampling rate in their wrangler.toml.

Further reading:

Heroku

Heroku takes a slightly different, highly configurable approach with Telemetry Drains. Users add a destination by specifying the endpoint, transport protocol and headers, and then explicitly choose which signals to export.

Heroku’s platform then gathers data from both the user’s application (via the OTel SDK) and first-party services (like their Router) and pushes it to the destination.

heroku telemetry:add <endpoint> --app <app-name> --signals traces,metrics,logs --transport http --headers '{"Authorization": "ingestion key"}'

Giving users granular control over whether the signals to export is a fantastic design pattern, especially for teams looking to manage data volume and ingestion costs.

Further reading:

Choosing Between Pull and Push Export Models

Across all three telemetry signals, the fundamental architectural question remains the same: will your platform enable the user to pull the data via an API that the user polls at an interval, or push it to a user-configured endpoint.

The Pull Model (Custom Receivers)

In a pull-based setup, you expose an API for your telemetry. The user’s observability stack, or a custom receiver that they build, has to continuously poll that API, handle the pagination, and ingest the results into their own backend.

Where it works

If you already have mature, well-tested APIs for logs or metrics, this design becomes relatively easy to setup. It also means you don't have to manage continuous outbound traffic from your infrastructure.

Where it hurts

The problem with this model is that you are shifting a massive operational burden onto your users. They are now entirely responsible for building scalable polling systems that can manage polling intervals, paginate API responses, retry on failures, and backfill missing data.

Worse, achieving near real-time delivery becomes incredibly difficult—which is almost always a critical requirement for latency-sensitive signals like traces and metrics.

For logs, a pull-based receiver often looks like this (e.g. CloudWatch Logs–style):

state: last_end_time
every poll_interval:
  start_time = last_end_time
  end_time   = now()
  next_token = null
  do:
    response = FilterLogEvents(log_groups, start_time, end_time, next_token)
    emit(response.events)
    next_token = response.nextToken
  while next_token != null

You’d need similar state-tracking logic for metrics or trace APIs.

Verdict

Because it forces users to write custom code just to convert your API responses into standard formats, the pull model is acceptable for legacy systems, but it shouldn’t be the default for a new design.

The Push Model (OTLP)

This is where the industry has landed.

Instead of waiting to be asked, your backend (or an OpenTelemetry Collector you run) actively exports logs, traces, and metrics directly to the user’s configured endpoint using OTLP (over HTTP or gRPC).

Where it works

It uses one vendor-neutral, industry-standard protocol for all three signals. OTLP provides first-class support for structured data and metadata, meaning crucial context—like linking specific logs to their parent trace IDs—is preserved automatically.

From the user's perspective, it is practically plug-and-play. Any OTel-compatible backend can ingest the data in near real-time without them writing a single line of custom polling logic.

Where it hurts

Here, the ball is in your court. Your internal engineering teams need to learn and adopt OpenTelemetry, and you need to have clear documentation on how users can configure their endpoints.

The Verdict

Adopting OpenTelemetry’s push model should be the way ahead for building modern, developer-friendly telemetry systems that preserve rich context and deliver data in near real-time.
Plus, it has been proven to work well at scale across cloud and self-hosted deployments by Cloudflare, Heroku, Kuma, and Keycloak.

OpenTelemetry’s independence from a particular vendor means users have the freedom to switch between observability vendors based on their business needs, without requiring a complete overhaul of their telemetry pipelines (or frequent configuration changes on your platform).

Use the pull model only when you already have a dominant API for a given signal and cannot add a push path.

Building the Export Experience

When actually implementing the export model into your product, seek to maximize flexibility with minimal configuration.

Let users configure an OTLP endpoint

Start by letting users provide their own OTLP endpoint and any necessary authentication headers (like an ingestion key).

But don’t stop there!

To allow users to control export volume, simplify data management, and reduce costs, take a page out of Heroku’s book and let users explicitly toggle which signals they want to export.

Standardize the architecture

Under the hood, you have two main options: you can either use the OpenTelemetry SDK directly within your services to emit data, or run an internal OTel Collector that gathers your system’s telemetry and re-exports it to the user’s endpoint.

Sticking to standard OpenTelemetry environment variables (like OTEL_EXPORTER_OTLP_ENDPOINT) makes the underlying plumbing reliable and easy to document and reason about.

Keep semantics consistent

Beyond just OTel-based environment variables, make sure you maintain consistent semantics across all your signals. Document your attribute names, schemas, and exactly how logs and metrics relate back to trace and span IDs.
When a user ingests your telemetry into their observability backend, everything should knit together to tell the complete story.

The best part of this approach is that you avoid the need to build different vendor integrations. By exporting telemetry via OTLP, you enable users to transform and ingest it in their desired formats.
For example, a user might wish to forward logs to their backend and to an object store like S3 for meeting compliance requirements.

Routing Telemetry to the User

As you design the configuration UI for your users, you’ll need to decide how granular to get with routing telemetry data. You generally have two paths, and they cater to different types of users.

Single Endpoint

For the mast majority of users, a single endpoint configuration is ideal. Here, the user inputs one base URL, and your exporter appends the standard OTLP paths (v1/traces, v1/metrics, and v1/logs ) internally.

Per-Signal Endpoints

Large-scale customers, or those managing complex observability setups, may wish to send telemetry signals to different platforms.

OTLP natively supports this with signal-specific variables defined by the OTEL_EXPORTER_<SIGNAL>_ENDPOINT pattern, along with corresponding header configurations. For example, to configure a specific endpoint for logs, you would use the OTEL_EXPORTER_LOGS_ENDPOINT.

Exposing this per-signal routing in your application is technically optional, but it is massive value add-on for advanced users.

Runnings Collectors to Manage Multi-Tenancy

Once you’re pushing OTLP (for any combination of logs, traces, metrics), you still need to decide how to run Collectors to manage multiple tenants.

Your Collector architecture needs to handle two dimensions of growth: onboarding more users, and the volume of new telemetry generated as you ship new features.

One Collector per Tenant

If your architecture already isolates tenants at the infrastructure level, you can deploy a dedicated Collector instance for each customer. In this case, every instance has a dedicated configuration pointing directly to that specific customer’s export endpoint.
This provides strong logical isolation guarantees. Slowdowns or mis-configurations in one customer’s pipeline do not affect other customers.

Although you can optimize the Collector binary by only shipping vital components, deploying hundreds or thousands of Collector instances will become resource-intensive.

Given this limitation, this architecture is usually the best fit for enterprise SaaS products where strong multi-tenant isolation is a strict requirement and customers might have complex endpoint configurations.

The Collector-per-tenant architecture provides strong isolation guarantees at the cost of increased resource usage.
The Collector-per-tenant architecture provides strong isolation guarantees at the cost of increased resource usage.

The Collector-per-tenant architecture provides strong isolation guarantees at the cost of increased resource usage.

Shared Collector with Static Pipelines per Tenant

In this architecture, all your platform’s telemetry funnels into a single, centralized Collector. Inside, you define separate pipelines per tenant, ensuring that the data gets routed to the correct external endpoint.

This pattern is desirable for smaller-scale teams because it much simpler to operate compared to the Collector-per-tenant architecture. You only have to monitor and scale one deployment, and all routing is configured in one place.

However, you must be comfortable managing a growing dynamic configuration file as your customer base grows.

The shared Collector pattern is easier to monitor and maintain as all exports route through one Collector instance.
The shared Collector pattern is easier to monitor and maintain as all exports route through one Collector instance.

The shared Collector pattern is easier to monitor and maintain as all exports route through one Collector instance.

Putting It All Together

To conclude, you don’t need to build bespoke integrations to let your users export telemetry to their own backends.

If you’re ready to implement OTel-native export in your application, here’s a summary of the key architectural and design steps to follow:

  • Decide which signals to support — logs, traces, and metrics — ideally all three, but start with what your product generates.
  • Use OTLP to export those signals via the OTel SDK or a Collector. The same push-based architecture works for all three.
  • Let users configure a destination endpoint and optional auth headers. Consider per-signal endpoints for teams with more complex setups.
  • Allow users to enable or disable individual signals to control data volume and costs.
  • Document your endpoint format (gRPC/HTTP), required headers, and attribute/schema semantics per signal so users can confidently rely on the data in their backends.
  • Choose a Collector topology: one Collector per tenant for strong isolation, or a shared Collector with per-tenant pipelines for simpler operation.
  • Provide an example config or env var snippet so users can get started quickly.

The points above also encapsulate the golden rules that Cloudflare (except for metrics), Heroku, Kuma, and Keycloak follow: default to push, stay vendor-neutral, and document the contract.

Designing for all three signals using open standards from the start removes friction, reduces your engineering overhead, and empowers customers to best utilize their data on their own terms.

Further Reading

Was this page helpful?

Your response helps us improve this page.

Tags
OpenTelemetryEngineering