AWS ELB Monitoring - Key Metrics for Availability, Performance, and Capacity

Updated Jan 28, 20268 min read

AWS Elastic Load Balancing (ELB) Monitoring Series

Part 1 of 3

AWS Elastic Load Balancing (ELB) distributes traffic across backend targets, like EC2 instances, containers, and IP addresses. As the entry point for all application traffic, load balancer failures directly impact user experience. Effective monitoring addresses this by identifying which metrics signal real issues, separating load balancer failures from backend problems, and understanding CloudWatch's blind spots.

This guide covers the ELB metrics that signal real problems, how to diagnose failures at each layer, and the limitations of default CloudWatch monitoring.

What is AWS Elastic Load Balancing (ELB)?

AWS ELB acts as the single point of contact for clients. It sits between users and backend servers (targets) to decouple the client from the specific servers running your code. There are four types of load balancers designed for different traffic patterns and protocol requirements.

Application Load Balancer (ALB)
It is a Layer 7 load balancer designed specifically for HTTP and HTTPS traffic. It understands application-level information, such as request paths, host headers, and query strings, allowing it to route traffic intelligently to different target groups based on URL paths or domains.
Network Load Balancer (NLB)
It operates at Layer 4 and is optimized for extreme performance, low latency, and handling millions of requests per second. It forwards traffic based solely on IP address and port, without inspecting application data, making it ideal for TCP, UDP, and TLS workloads.
Gateway Load Balancer (GWLB)
It is designed for integrating network and security appliances into traffic flows. It uses the GENEVE protocol to transparently route traffic to virtual appliances such as firewalls, intrusion detection systems, or deep packet inspection tools, while maintaining source and destination IP visibility.
Classic Load Balancer (CLB)
It is the original ELB offering that supports basic Layer 4 and Layer 7 load balancing. It provides simple traffic distribution and health checks but lacks many advanced features such as modern routing rules, native container awareness, and deep observability integrations.

How to Monitor AWS ELB?

AWS ELB is monitored primarily by observing its metrics, logs, and health signals to understand how traffic is flowing and whether the load balancer or its targets are experiencing issues.

By default, AWS Elastic Load Balancing publishes operational metrics such as request counts, latency, error rates, and target health status to Amazon CloudWatch without any additional configuration. These metrics provide continuous visibility into how the load balancer is performing and how backends are responding under real traffic. In addition to these metrics, Elastic Load Balancers can emit access logs that capture request-level details for deeper analysis.

Key Metrics for AWS ELB Monitoring

Instead of enabling every metric, focus on the signals that indicate user impact. Signals can be categorized by Availability, Performance, and Capacity.

1. Availability Metrics (Is it working?)

These are "page-the-on-call-engineer" metrics.

Metric Name	Load Balancer Type	Criticality	What it tells you
`UnHealthyHostCount`	ALB / NLB / CLB	High	Number of registered targets failing health checks. If this equals total target count, the application is effectively unavailable.
`HTTPCode_ELB_5XX_Count`	ALB	High	5xx errors generated by the load balancer. For ALB, common causes include 502 (RST/unexpected/malformed target response or SSL handshake issues), 503 (no registered targets / targets in `unused`), and 504 (connection timeout to target or idle timeout waiting for target).
`HTTPCode_Target_5XX_Count`	ALB	High	5xx responses returned by backend targets. The request reached the application, but it failed while processing.
`ActiveFlowCount`	NLB	Medium	Number of active TCP flows. A sudden drop to zero usually indicates upstream network, DNS, or connectivity issues rather than application errors.

2. Performance Metrics (Is it fast?)

These metrics measure Latency. High latency results in a poor user experience, whereas low latency results in a good user experience.

Metric Name	Load Balancer Type	Criticality	What it tells you
`TargetResponseTime`	ALB	High	The time elapsed from when the load balancer sends the request to a target until it receives the response headers. This is your application latency.
`RequestCount`	ALB / CLB	Medium	Total number of requests processed. Sudden spikes are useful for correlating latency or error-rate increases with traffic surges.
`ClientTLSNegotiationErrorCount`	ALB / NLB (TLS listeners)	Low	Number of failed TLS handshakes initiated by clients. Spikes usually indicate certificate issues, protocol mismatches, or unsupported cipher suites.

3. Capacity & Saturation (Is it full?)

AWS ELBs scale automatically, but they are not instantaneous. Monitoring these capacity signals helps in detecting saturation attacks or misconfigurations before they cause outages.

Metric	Load Balancer Type	Criticality	Description
`SurgeQueueLength`	CLB	Critical	The number of requests/connections pending routing to a healthy instance (max 1024). Non-zero often indicates backend slowness/insufficient capacity; if it hits the max, requests are rejected and `SpilloverCount` increases.
`ActiveConnectionCount`	ALB	High	The total number of concurrent TCP connections. A sudden spike can indicate a DDoS attack or a retry storm.
`ConsumedLCUs`	ALB / NLB	Warning	Load Balancer Capacity Units used. Monitor this for cost control. If you see a sudden jump, you might be under a DDoS attack or an inefficient traffic pattern (e.g., excessive new connections).

CloudWatch Limitations for ELB Monitoring

Monitoring AWS ELB using AWS CloudWatch presents several practical challenges, particularly at scale. Limitations around cost, metric granularity, query flexibility, and cross-signal correlation often make it difficult to achieve deep, real-time visibility into load balancer behaviour and its impact on downstream services. The following are a few common limitations users face.

Metrics Availability Tied to Activity

CloudWatch only publishes ELB metrics when traffic is flowing through the load balancer. Periods without requests result in unreported metrics, potentially misleading users into assuming stability when data is simply absent.

Low Granularity and Data Delays

Standard metrics report at 60-second intervals, with potential delays in metric availability, hindering real-time incident response. This stems from CloudWatch's aggregation and buffering mechanisms.

Limited Query and Correlation Capabilities

CloudWatch Logs Insights uses a proprietary query language that lacks robust correlation across multiple log streams or services, making deep ELB event analysis more challenging than with specialised log analytics platforms.

Inflexible UI and Dashboards

Default dashboards are restrictive for ELB monitoring, with limited visualization, filtering, and customization for traffic patterns or error spikes. The console's design prioritizes simplicity over advanced interactivity.

High Costs at Scale

Detailed or custom metrics, high-resolution data, and large-scale storage quickly drive up costs, with users perceiving limited value relative to alternatives. This is amplified by overlapping charges for ingestion, storage, and retrieval.

Troubleshooting Common AWS ELB Alerts

Use this troubleshooting reference to separate infrastructure failures from application bugs during incident response.

Alert / Signal	What it usually means	Common causes
High `HTTPCode_Target_5XX_Count`	Requests are reaching targets and the application (or upstream dependency) is returning 5xx	App bug/crash, dependency outage (DB/cache), resource exhaustion (CPU/mem), slow code path causing timeouts
High `HTTPCode_ELB_5XX_Count`	Error is from load balancer / connectivity / target registration, not app logic (LB can’t route/complete request)	No healthy targets, target registration issues, network path blocked, target not accepting connections, timeouts between LB↔target
`HTTPCode_ELB_5XX_Count` + 503 spike	Often no healthy targets or LB can’t route to targets	Bad deploy causing failed health checks, health check path broken, targets deregistered/autoscaling issue
`HTTPCode_ELB_5XX_Count` + 504 spike	LB timed out waiting for target response	Downstream dependency slow, thread pool saturation, large responses, backend overload
`HTTPCode_ELB_5XX_Count` + 502 spike	LB received bad/failed connection/response to target	SG/NACL blocks, port not listening, backend process crash, protocol mismatch

Next Steps

At this point, you know what AWS ELB is, which metrics actually reflect user impact, and why default monitoring alone is not sufficient when you need deeper visibility. The next part of this series shifts from concepts to execution, where we will set up metrics monitoring for an Application Load Balancer and see how to monitor the same critical ELB signals using SigNoz in a more actionable way.

Next in "AWS Elastic Load Balancing (ELB) Monitoring Series" (Part 2 of 3)

How to Monitor AWS ELB Metrics with SigNoz - A Step-by-Step Guide

View Full Series