Health check monitoring is a critical practice for maintaining reliable and high-performing systems. It allows you to proactively identify and address issues before they impact your users.

This guide explores the fundamentals of health check monitoring, its importance in modern systems, and practical strategies for implementation.

You will also learn how HTTP endpoints can be monitored with OpenTelemetry. You will use the OpenTelemetry Collector to collect metrics from the target endpoint and send them to SigNoz for monitoring and visualization.

If you want to jump straight into implementation, start with this prerequisites.

What is Health Check Monitoring?

Health check monitoring is a systematic approach to assessing the status and performance of your systems and applications. It involves regular checks on various components to ensure they're functioning correctly. These checks can range from simple ping tests to complex, custom-designed probes that evaluate specific functionalities.

Key components of health check monitoring include:

  1. Endpoints: Dedicated URLs or interfaces that respond to monitoring requests.
  2. Metrics: Specific data points that indicate system health (e.g., response time, error rate).
  3. Thresholds: Predefined limits that, when exceeded, trigger alerts or actions.

Health checks come in various forms:

  • Basic ping checks: Verify if a service is responsive.
  • HTTP requests: Assess the availability and response of web services.
  • Custom checks: Evaluate specific business logic or complex system states.

What is an HTTP endpoint?

An HTTP endpoint is a specific URL or path on a web server that is designed to receive and respond to HTTP requests. It acts as an entry point for clients to interact with the server and access its resources. When a client sends an HTTP request to a specific endpoint, the server processes the request and sends back an HTTP response. The endpoint, therefore, acts as a gateway or entry point for interacting with a specific part of a web application or service.

Why Monitor HTTP endpoints?

Monitoring HTTP endpoints is crucial for ensuring the continuous availability, optimal performance, and reliability of web services. It helps detect issues such as downtime, slow response times, or errors, allowing for proactive identification and resolution of potential problems.

Why is Health Check Monitoring Critical for Modern Systems?

In today's fast-paced digital landscape, system reliability is paramount. Health check monitoring plays a crucial role in maintaining this reliability by:

  1. Ensuring high availability: Continuous monitoring helps minimize downtime by quickly identifying and addressing issues.
  2. Enabling early detection: Problems are often caught before they escalate and impact end-users.
  3. Facilitating rapid response: Automated alerts allow teams to react swiftly to anomalies.
  4. Supporting DevOps practices: Regular health checks align with continuous delivery and integration principles.

Benefits of Implementing Health Check Monitoring

Effective health check monitoring offers numerous advantages:

  • Improved system reliability: Consistent monitoring leads to more stable systems and higher user satisfaction.
  • Reduced MTTD and MTTR: Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) decrease with proactive monitoring.
  • Enhanced capacity planning: Monitoring data informs decisions about resource allocation and scaling.
  • Better SLA compliance: Regular checks help ensure adherence to Service Level Agreements.

How to Implement Effective Health Check Monitoring

To set up robust health check monitoring:

  1. Identify critical components: Determine which services and systems are essential to your operations.
  2. Design meaningful endpoints: Create specific URLs (e.g., /health, /status) that accurately represent component health.
  3. Define appropriate metrics: Choose indicators that truly reflect the health of each component.
  4. Set realistic thresholds: Establish limits that balance sensitivity with the risk of false alarms.
  5. Implement automated alerts: Configure notifications to inform relevant team members when issues arise.

Best Practices for Health Check Design

Follow these guidelines to create effective health checks:

  • Keep checks lightweight: Ensure monitoring doesn't significantly impact system performance.
  • Use appropriate timeouts: Prevent cascading failures by setting reasonable time limits for checks.
  • Implement authentication: Secure sensitive health check endpoints to prevent unauthorized access.
  • Ensure representativeness: Design checks that accurately reflect real-world usage and potential issues.

What is OpenTelemetry

OpenTelemetry is a set of APIs, SDKs, libraries, and integrations aiming to standardize the generation, collection, and management of telemetry data(logs, metrics, and traces). It is backed by the Cloud Native Computing Foundation and is the leading open-source project in the observability domain.

The data you collect with OpenTelemetry is vendor-agnostic and can be exported in many formats. Telemetry data has become critical in observing the state of distributed systems. With microservices and polyglot architectures, there was a need to have a global standard. OpenTelemetry aims to fill that space and is doing a great job at it thus far.

What is OpenTelemetry Collector?

OpenTelemetry Collector is a stand-alone service provided by OpenTelemetry. It can be used as a telemetry-processing system with a lot of flexible configurations that gather and process observability data, such as traces, metrics, and logs, from different parts of a software system. It then sends this data to chosen destinations, allowing for centralized analysis and monitoring. The collector simplifies the task of collecting and exporting telemetry data in cloud-native environments.

How does OpenTelemetry Collector collect data?

Data collection in OpenTelemetry Collector is facilitated through receivers. Receivers are configured via YAML under the top-level receivers tag. To ensure a valid configuration, at least one receiver must be enabled.

Below is an example of an otlp receiver:

receivers:
  otlp:
    protocols:
      grpc:
      http:

The OTLP receiver accepts data through gRPC or HTTP in the OTLP format. There are advanced configurations that you can enable via the YAML file.

Here’s a sample configuration for an otlp receiver:

receivers:
  otlp:
    protocols:
      http:
        endpoint: "localhost:4318"
        cors:
          allowed_origins:
            - <http://test.com>
            # Origins can have wildcards with *, use * by itself to match any origin.
            - https://*.example.com
          allowed_headers:
            - Example-Header
          max_age: 7200

You can find more details on advanced configurations here.

Once a receiver is configured, it needs to be enabled to start the data flow. This involves setting up pipelines within a service. A pipeline acts as a streamlined pathway for data, outlining how it should be processed and where it should go. A pipeline comprises of the following:

  1. Receivers: These are entry points for data into the OpenTelemetry Collector, responsible for collecting data from various sources and feeding it into the pipeline.
  2. Processors: After data is received, processors manipulate, filter, or enhance the data as needed before it proceeds further in the pipeline. They provide a way to customize the data according to specific requirements.
  3. Exporters: After processing, the data is ready for export. Exporters define the destination for the data, whether it's an external monitoring system, storage, or another service. They format the data appropriately for the chosen output.

Below is an example pipeline configuration:

service:
  pipelines:
    metrics:
      receivers: [otlp, httpcheck]
      processors: [batch]
      exporters: [otlp]

Here’s a breakdown of the above metrics pipeline:

  • Receivers: This pipeline is configured to receive metrics data from two sources: OTLP and HTTP Check. The otlp receiver collects metrics using both gRPC and HTTP protocols, while the httpcheck receiver gathers metrics from the HTTP endpoint.
  • Processors: Metrics data is processed using the batch processor. This processor likely batches metrics before exporting them, optimizing the data flow.
  • Exporters: Metrics processed through this pipeline are exported to the OTLP destination. The otlp exporter sends data to an endpoint specified in the configuration.

Monitoring HTTP endpoint with OpenTelemetry Collector

In this section, you will learn how HTTP endpoints can be monitored and how metrics can be collected from those endpoints using the OpenTelemetry Collector. You will also learn how the collected metrics can be sent to SigNoz for visualization.

Prerequisites

Setting up SigNoz

You need a backend to send the collected data for monitoring and visualization. SigNoz is an OpenTelemetry-native APM that is well-suited for visualizing OpenTelemetry data.

SigNoz cloud is the easiest way to run SigNoz. Sign up for a free account and get 30 days of unlimited access to all features.

Get Started - Free CTA

You can also install and self-host SigNoz yourself since it is open-source. With 19,000+ GitHub stars, open-source SigNoz is loved by developers. Find the instructions to self-host SigNoz.

Setting up OpenTelemetry Collector

The OpenTelemetry Collector offers various deployment options to suit different environments and preferences. It can be deployed using Docker, Kubernetes, Nomad, or directly on Linux systems. You can find all the installation options here. For the purpose of this article, the OpenTelemetry Collector will be installed manually.

Download the OpenTelemetry Collector

Download the appropriate binary package for your Linux or macOS distribution from the OpenTelemetry Collector releases page. We are using the latest version available at the time of writing this tutorial.

curl --proto '=https' --tlsv1.2 -fOL <https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.89.0/otelcol-contrib_0.89.0_darwin_arm64.tar.gz>

📝 Note

For macOS users, download the binary package specific to your system.

BuildArchitecture
M1 Chiparm64
Intelamd64 (x86-64)

Extract the package

Create a new directory named otelcol-contrib and then extract the contents of the otelcol-contrib_0.89.0_darwin_arm64.tar.gz archive into this newly created directory with the following command:

mkdir otelcol-contrib

Extract the contents of the binary package in that directory:

tar xvzf otelcol-contrib_0.89.0_darwin_arm64.tar.gz -C otelcol-contrib

Go into the directory:

cd otelcol-contrib

Set up the Configuration file

In the same otelcol-contrib directory, create a config.yaml file:

touch config.yaml

Paste the below config into the file:

receivers:
  otlp:
    protocols:
      grpc: {}
      http: {}
  httpcheck:
    targets:
      - endpoint: <http://localhost:5000/>
        method: GET
      - endpoint: <http://localhost:5000/one>
        method: GET
      - endpoint: <http://localhost:5000/two>
        method: GET
      - endpoint: <http://localhost:5000/three>
        method: GET
      - endpoint: <http://localhost:5000/four>
        method: GET
      - endpoint: <http://localhost:5000/error>
        method: GET
    collection_interval: 10s

processors:
  batch:
    send_batch_size: 1000
    timeout: 10s

exporters:
  otlp:
    endpoint: "ingest.{region}.signoz.cloud:443"
    tls:
      insecure: false
    timeout: 20s
    headers:
      "signoz-ingestion-key": "<SIGNOZ_INGESTION_KEY>"
  logging:
    verbosity: detailed

service:
  telemetry:
    metrics:
      address: localhost:8888
  pipelines:
    metrics:
      receivers: [otlp, httpcheck]
      processors: [batch]
      exporters: [otlp]

The HTTP Check Receiver is a component of the OpenTelemetry Collector that enables monitoring of HTTP endpoints. It periodically sends HTTP requests to specified endpoints and collects metrics about the responses, such as response time, status code, and error messages. You can find more information on the OpenTelemetry HTTP Check receiver here.

In the above config file, you can see that we are setting the endpoints as targets to be monitored.

Replace http://localhost:5000 with the correct endpoint where you have your web app running. If you have only one target endpoint, take out the rest endpoints. Also, replace {region} with the region for your SigNoz cloud account and <SIGNOZ_INGESTION_KEY> with the ingestion key for your account. You can find these settings in the SigNoz dashboard under Settings > Ingestion Settings.

You can find ingestion details in the SigNoz dashboard
You can find ingestion details in the SigNoz dashboard

Run the collector service

In the same otelcol-contrib directory, run the below command to start the collector service:

./otelcol-contrib --config ./config.yaml

You should receive a similar output to show it has started successfully:

2023-11-29T00:16:14.193+0100    info    service@v0.89.0/telemetry.go:85    Setting up own telemetry...
2023-11-29T00:16:14.195+0100    info    service@v0.89.0/telemetry.go:202   Serving Prometheus metrics      {"address": "localhost:8888", "level": "Basic"}
2023-11-29T00:16:14.197+0100    info    receiver@v0.89.0/receiver.go:296   Development component. May change in the future.        {"kind": "receiver", "name": "httpcheck", "data_type": "metrics"}
2023-11-29T00:16:14.198+0100    info    service@v0.89.0/service.go:143     Starting otelcol-contrib...     {"Version": "0.89.0", "NumCPU": 12}
2023-11-29T00:16:14.198+0100    info    extensions/extensions.go:34        Starting extensions...
2023-11-29T00:16:14.200+0100    warn    internal@v0.89.0/warning.go:40     Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks     {"kind": "receiver", "name": "otlp", "data_type": "metrics", "documentation": "<https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks>"}
2023-11-29T00:16:14.201+0100    info    otlpreceiver@v0.89.0/otlp.go:83    Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "metrics", "endpoint": "0.0.0.0:4317"}
2023-11-29T00:16:14.202+0100    warn    internal@v0.89.0/warning.go:40     Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks     {"kind": "receiver", "name": "otlp", "data_type": "metrics", "documentation": "<https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks>"}
2023-11-29T00:16:14.203+0100    info    otlpreceiver@v0.89.0/otlp.go:101   Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "metrics", "endpoint": "0.0.0.0:4318"}
2023-11-29T00:16:14.203+0100    info    service@v0.89.0/service.go:169     Everything is ready. Begin running and processing data.

Monitoring HTTP Endpoints with SigNoz dashboard

Once the collector service has started successfully, it will collect the data and send it to SigNoz.

Navigate to your SigNoz Cloud account and access the "Dashboard" tab. Click on the “New Dashboard” button to create a new dashboard.

For HTTP endpoint monitoring, you can use the time-series chart.

Create a new dashboard and click on the time-series panel type
Create a new dashboard and click on the time-series panel type

In the "Query Builder" tab, enter "http", and you should see various metrics collected by the httpcheck receiver. This confirms that the OpenTelemetry Collector is successfully collecting the http metrics and forwarding them to SigNoz for monitoring and visualization.

HTTP metrics sent to SigNoz

HTTP Receiver Metrics monitored by SigNoz
HTTP Receiver Metrics monitored by SigNoz

In SigNoz, you can monitor the health of HTTP endpoints by running queries against the httpcheck_status metrics. This metric is generated by the HTTP Check Receiver, which makes a request to the specified endpoint and generates a metric labeled for each HTTP response status class with a value of 1 if the status code matches the class.

So if you need the endpoints monitored, you can create a query with the where clause set to http_status_class = 2xx .

It will give you a chart where you can see all the URLs are monitored their value as 1 when they are up and 0 when they are down.

Chart for monitoring HTTP endpoints in SigNoz
Chart for monitoring HTTP endpoints in SigNoz

You can also create an alert for an endpoint going down and get it in your preferred notification channel. Click on drop-down at the top of the panel and then click on Create Alerts.

It will take you to the alerts page; from there, you can create the alerts.

Create alerts to be notified in case an endpoint is down
Create alerts to be notified in case an endpoint is down

Visit the SigNoz documentation to learn more about creating dashboards and running queries.

Metrics and Resource Attributes for HTTP receiver supported by OpenTelemetry

The following metrics and resource attributes for the HTTP receiver can be collected by the OpenTelemetry Collector.

Metrics

These metrics are enabled by default. Each of them can be disabled by applying the following configuration:

metrics:
  <metric_name>:
    enabled: false

MetricsDescriptionMetrics NameMetric TypeValue TypeUnitAggregation TemporalityMonotonic
HTTP Check DurationMeasures the duration of the HTTP checkhttpcheck.durationGaugeIntms--
HTTP Check ErrorRecords errors occurring during HTTP checkhttpcheck.errorSumInt{error}CumulativeFalse
HTTP Check StatusRecords the status of the HTTP check (1 for success, 0 for failure)httpcheck.statusSumInt1CumulativeFalse

You can visit the HTTP Check receiver GitHub repo to learn more about these metrics.

Attributes

Resource attributes are a set of key-value pairs that provide additional context about the source of a metric. They are used to identify and classify metrics and to associate them with specific resources or entities within a system.

The below attributes are enabled by default for the HTTP receiver.

NameDescriptionValues
http.urlFull HTTP request URL.Any Str
http.status_codeHTTP response status codeAny Int
http.methodHTTP request methodAny Str
http.status_classHTTP response status classAny Str

You can see these resource attributes in the OpenTelemetry Collector Contrib repo for the HTTP receiver.

Advanced Health Check Monitoring Techniques

As systems grow more complex, advanced monitoring techniques become necessary:

  1. Synthetic transactions: Simulate user interactions to test end-to-end functionality.
  2. Watchdogs: Implement separate processes that monitor and restart failed components.
  3. Orchestrator integration: Leverage platforms like Kubernetes for automated health management.
  4. Circuit breakers: Implement failure detection mechanisms to prevent system-wide cascading failures.

Tools and Technologies for Health Check Monitoring

A variety of tools are available for implementing health check monitoring:

  • Open-source platforms: SigNoz, Prometheus, Grafana, and Nagios offer powerful monitoring capabilities.
  • Cloud-native solutions: AWS CloudWatch, Google Cloud Monitoring, and Azure Monitor provide integrated monitoring for cloud environments.
  • Commercial tools: DataDog, New Relic, and Dynatrace offer comprehensive monitoring suites.

When choosing a monitoring solution, consider factors such as:

  • Integration with your existing infrastructure
  • Scalability to match your system's growth
  • Customization options for your specific needs
  • Cost-effectiveness for your budget and requirements.

Conclusion

In this tutorial, you monitored an HTTP endpoint, collected metrics using the OpenTelemetry collector, and sent them to SigNoz for monitoring and visualization.

Visit our complete guide on OpenTelemetry Collector to learn more about it.

OpenTelemetry is becoming a global standard for open-source observability, offering advantages such as a unified standard for all telemetry signals and avoiding vendor lock-in. With OpenTelemetry, instrumenting your applications to collect logs, metrics, and traces becomes seamless, and you can monitor and visualize your telemetry data with SigNoz.

SigNoz is an open-source OpenTelemetry-native APM that can be used as a single backend for all your observability needs.

FAQs

What's the difference between health checks and other types of monitoring?

Health checks focus on the operational status of system components, while other monitoring types may track performance metrics, log data, or user behavior. Health checks are typically more frequent and lightweight, designed to quickly identify if a service is functioning correctly.

How often should health checks be performed?

The frequency of health checks depends on your system's criticality and resources. High-priority services may require checks every few seconds, while less critical components might be checked every few minutes. Balance the need for timely detection with the overhead of frequent checks.

Can health check monitoring impact system performance?

While health checks are designed to be lightweight, frequent or poorly implemented checks can impact performance. Optimize your health checks by keeping them simple, using appropriate intervals, and ensuring they don't consume excessive resources.

How do I determine the right metrics and thresholds for my health checks?

Start by identifying key indicators of your system's health, such as response time, error rates, or resource utilization. Set initial thresholds based on historical data and expected performance. Refine these over time based on real-world observations and feedback from your team and users.


Further Reading


Was this page helpful?