AWS ELB Monitoring - Key Metrics for Availability, Performance, and Capacity

Updated Jan 28, 20268 min read

AWS Elastic Load Balancing (ELB) distributes traffic across backend targets, like EC2 instances, containers, and IP addresses. As the entry point for all application traffic, load balancer failures directly impact user experience. Effective monitoring addresses this by identifying which metrics signal real issues, separating load balancer failures from backend problems, and understanding CloudWatch's blind spots.

This guide covers the ELB metrics that signal real problems, how to diagnose failures at each layer, and the limitations of default CloudWatch monitoring.

What is AWS Elastic Load Balancing (ELB)?

AWS ELB acts as the single point of contact for clients. It sits between users and backend servers (targets) to decouple the client from the specific servers running your code. There are four types of load balancers designed for different traffic patterns and protocol requirements.

  • Application Load Balancer (ALB)

    It is a Layer 7 load balancer designed specifically for HTTP and HTTPS traffic. It understands application-level information, such as request paths, host headers, and query strings, allowing it to route traffic intelligently to different target groups based on URL paths or domains.

  • Network Load Balancer (NLB)

    It operates at Layer 4 and is optimized for extreme performance, low latency, and handling millions of requests per second. It forwards traffic based solely on IP address and port, without inspecting application data, making it ideal for TCP, UDP, and TLS workloads.

  • Gateway Load Balancer (GWLB)

    It is designed for integrating network and security appliances into traffic flows. It uses the GENEVE protocol to transparently route traffic to virtual appliances such as firewalls, intrusion detection systems, or deep packet inspection tools, while maintaining source and destination IP visibility.

  • Classic Load Balancer (CLB)

    It is the original ELB offering that supports basic Layer 4 and Layer 7 load balancing. It provides simple traffic distribution and health checks but lacks many advanced features such as modern routing rules, native container awareness, and deep observability integrations.

How to Monitor AWS ELB?

AWS ELB is monitored primarily by observing its metrics, logs, and health signals to understand how traffic is flowing and whether the load balancer or its targets are experiencing issues.

By default, AWS Elastic Load Balancing publishes operational metrics such as request counts, latency, error rates, and target health status to Amazon CloudWatch without any additional configuration. These metrics provide continuous visibility into how the load balancer is performing and how backends are responding under real traffic. In addition to these metrics, Elastic Load Balancers can emit access logs that capture request-level details for deeper analysis.

Key Metrics for AWS ELB Monitoring

Instead of enabling every metric, focus on the signals that indicate user impact. Signals can be categorized by Availability, Performance, and Capacity.

1. Availability Metrics (Is it working?)

These are "page-the-on-call-engineer" metrics.

Metric NameLoad Balancer TypeCriticalityWhat it tells you
UnHealthyHostCountALB / NLB / CLBHighNumber of registered targets failing health checks. If this equals total target count, the application is effectively unavailable.
HTTPCode_ELB_5XX_CountALBHigh5xx errors generated by the load balancer. For ALB, common causes include 502 (RST/unexpected/malformed target response or SSL handshake issues), 503 (no registered targets / targets in unused), and 504 (connection timeout to target or idle timeout waiting for target).
HTTPCode_Target_5XX_CountALBHigh5xx responses returned by backend targets. The request reached the application, but it failed while processing.
ActiveFlowCountNLBMediumNumber of active TCP flows. A sudden drop to zero usually indicates upstream network, DNS, or connectivity issues rather than application errors.

2. Performance Metrics (Is it fast?)

These metrics measure Latency. High latency results in a poor user experience, whereas low latency results in a good user experience.

Metric NameLoad Balancer TypeCriticalityWhat it tells you
TargetResponseTimeALBHighThe time elapsed from when the load balancer sends the request to a target until it receives the response headers. This is your application latency.
RequestCountALB / CLBMediumTotal number of requests processed. Sudden spikes are useful for correlating latency or error-rate increases with traffic surges.
ClientTLSNegotiationErrorCountALB / NLB (TLS listeners)LowNumber of failed TLS handshakes initiated by clients. Spikes usually indicate certificate issues, protocol mismatches, or unsupported cipher suites.

3. Capacity & Saturation (Is it full?)

AWS ELBs scale automatically, but they are not instantaneous. Monitoring these capacity signals helps in detecting saturation attacks or misconfigurations before they cause outages.

MetricLoad Balancer TypeCriticalityDescription
SurgeQueueLengthCLBCriticalThe number of requests/connections pending routing to a healthy instance (max 1024). Non-zero often indicates backend slowness/insufficient capacity; if it hits the max, requests are rejected and SpilloverCount increases.
ActiveConnectionCountALBHighThe total number of concurrent TCP connections. A sudden spike can indicate a DDoS attack or a retry storm.
ConsumedLCUsALB / NLBWarningLoad Balancer Capacity Units used. Monitor this for cost control. If you see a sudden jump, you might be under a DDoS attack or an inefficient traffic pattern (e.g., excessive new connections).

CloudWatch Limitations for ELB Monitoring

Monitoring AWS ELB using AWS CloudWatch presents several practical challenges, particularly at scale. Limitations around cost, metric granularity, query flexibility, and cross-signal correlation often make it difficult to achieve deep, real-time visibility into load balancer behaviour and its impact on downstream services. The following are a few common limitations users face.

Metrics Availability Tied to Activity

CloudWatch only publishes ELB metrics when traffic is flowing through the load balancer. Periods without requests result in unreported metrics, potentially misleading users into assuming stability when data is simply absent.

Low Granularity and Data Delays

Standard metrics report at 60-second intervals, with potential delays in metric availability, hindering real-time incident response. This stems from CloudWatch's aggregation and buffering mechanisms.

Limited Query and Correlation Capabilities

CloudWatch Logs Insights uses a proprietary query language that lacks robust correlation across multiple log streams or services, making deep ELB event analysis more challenging than with specialised log analytics platforms.

Inflexible UI and Dashboards

Default dashboards are restrictive for ELB monitoring, with limited visualization, filtering, and customization for traffic patterns or error spikes. The console's design prioritizes simplicity over advanced interactivity.

High Costs at Scale

Detailed or custom metrics, high-resolution data, and large-scale storage quickly drive up costs, with users perceiving limited value relative to alternatives. This is amplified by overlapping charges for ingestion, storage, and retrieval.

Troubleshooting Common AWS ELB Alerts

Use this troubleshooting reference to separate infrastructure failures from application bugs during incident response.

Alert / SignalWhat it usually meansCommon causes
High HTTPCode_Target_5XX_CountRequests are reaching targets and the application (or upstream dependency) is returning 5xxApp bug/crash, dependency outage (DB/cache), resource exhaustion (CPU/mem), slow code path causing timeouts
High HTTPCode_ELB_5XX_CountError is from load balancer / connectivity / target registration, not app logic (LB can’t route/complete request)No healthy targets, target registration issues, network path blocked, target not accepting connections, timeouts between LB↔target
HTTPCode_ELB_5XX_Count + 503 spikeOften no healthy targets or LB can’t route to targetsBad deploy causing failed health checks, health check path broken, targets deregistered/autoscaling issue
HTTPCode_ELB_5XX_Count + 504 spikeLB timed out waiting for target responseDownstream dependency slow, thread pool saturation, large responses, backend overload
HTTPCode_ELB_5XX_Count + 502 spikeLB received bad/failed connection/response to targetSG/NACL blocks, port not listening, backend process crash, protocol mismatch

Next Steps

At this point, you know what AWS ELB is, which metrics actually reflect user impact, and why default monitoring alone is not sufficient when you need deeper visibility. The next part of this series shifts from concepts to execution, where we will set up metrics monitoring for an Application Load Balancer and see how to monitor the same critical ELB signals using SigNoz in a more actionable way.

Was this page helpful?

Tags
awscloudwatch