Load Balancing: L4 vs L7
Layer 4 vs Layer 7 load balancing, algorithms (round-robin, least connections, consistent hashing), health checks, and sticky sessions.
Why Load Balancing?
A single server has a finite capacity. As traffic grows, you need to distribute requests across multiple backend instances. A load balancer sits in front of your server fleet and routes incoming requests, ensuring no single server is overwhelmed, providing redundancy when servers fail, and enabling horizontal scaling.
L4 vs L7 Load Balancing
Load balancers operate at different layers of the OSI model. The two most relevant are Layer 4 (Transport) and Layer 7 (Application).
| Aspect | L4 Load Balancer | L7 Load Balancer |
|---|---|---|
| OSI Layer | Transport (TCP/UDP) | Application (HTTP/HTTPS/gRPC) |
| What it sees | IP addresses, ports, TCP connection state | Full HTTP headers, URL paths, cookies, request body |
| Routing basis | IP + port only | URL path, hostname, header values, HTTP method |
| TLS termination | Usually no (TLS passthrough) | Yes — terminates TLS and re-encrypts or sends plaintext to backend |
| Performance | Very fast, lower CPU overhead | More CPU (parses HTTP), but smarter routing |
| Sticky sessions | IP-hash based | Cookie-based (more reliable) |
| Protocol awareness | None — treats all TCP traffic equally | Can route `/api` to API servers and `/static` to file servers |
| Examples | AWS NLB, HAProxy TCP mode, IPVS | AWS ALB, NGINX, Envoy, Traefik |
When to choose L4 vs L7
Use L4 when you need raw throughput (millions of short TCP connections, non-HTTP protocols like SMTP or game servers). Use L7 when you need content-aware routing, TLS termination, rate limiting, or observability into HTTP traffic. Most modern web applications use L7 load balancers.
Load Balancing Algorithms
How the load balancer selects which backend server to route each request to is determined by its balancing algorithm. Different algorithms suit different workloads.
| Algorithm | How It Works | Best For | Weakness |
|---|---|---|---|
| Round-robin | Rotate through servers in order | Homogeneous servers, uniform request cost | Ignores server load or request heaviness |
| Weighted round-robin | Round-robin but servers get more/fewer requests proportional to their weight | Mixed server capacities (some servers are larger) | Still ignores real-time load |
| Least connections | Route to the server with the fewest active connections | Long-lived connections (WebSockets, file uploads) | Connection count != actual load |
| Least response time | Route to the server with lowest avg response time | Latency-sensitive APIs | Requires active latency tracking |
| IP hash | Hash the client IP to pick a consistent server | Stateful sessions without session sharing | Load can be uneven; poor for large NAT pools |
| Consistent hashing | Hash a key (IP, user ID, URL) onto a hash ring | Cache-friendly routing, stateful services | Adding/removing servers requires re-hashing |
| Random | Pick a backend at random | Simple, works surprisingly well at scale | No load awareness |
Health Checks
Load balancers continuously probe backend servers to detect failures. Unhealthy servers are removed from the rotation until they recover.
- Passive health checks: The load balancer watches for errors on real traffic. If a server returns 5xx errors or TCP resets beyond a threshold, it's marked unhealthy. Zero overhead, but requires real traffic to fail first.
- Active health checks: The load balancer periodically sends synthetic requests (HTTP GET `/health`, TCP probe, etc.) and checks for expected responses. Catches failures before real traffic is affected.
- Health check parameters: Interval (how often to probe), timeout (how long to wait), healthy threshold (successful probes to become healthy), unhealthy threshold (failed probes to be removed).
# Example: AWS ALB target group health check config
HealthCheckProtocol: HTTP
HealthCheckPath: /health
HealthCheckIntervalSeconds: 15
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2 # 2 consecutive successes → healthy
UnhealthyThresholdCount: 3 # 3 consecutive failures → unhealthy
Matcher:
HttpCode: "200"Sticky Sessions (Session Affinity)
Some applications store session state in server memory rather than a shared store. Sticky sessions ensure a given client always routes to the same backend server.
- IP-hash stickiness: Routes all traffic from the same client IP to the same server. Breaks down with large NAT pools (many users sharing one IP) or mobile clients (changing IPs).
- Cookie-based stickiness: The load balancer sets a cookie (`AWSALB`, `JSESSIONID`) on the first response, then uses it to route subsequent requests to the same server. More reliable than IP-hash.
Avoid sticky sessions in new designs
Sticky sessions reduce your ability to redistribute load (a 'hot' server can't shed requests) and cause failures when a server restarts. The better architecture is to externalize session state to a shared store (Redis) so any server can handle any request. This makes your fleet truly stateless.
Connection Draining (Deregistration Delay)
When a backend server is removed (during a deployment or scale-in event), the load balancer enters connection draining mode for that server: it stops sending new requests to it but allows existing in-flight requests to complete within a grace period (typically 30–300 seconds). After the grace period, remaining connections are forcibly closed.
Interview Tip
A classic interview question: 'How do you deploy a new version of your service without dropping requests?' The answer involves connection draining combined with a rolling deployment strategy. The load balancer removes old servers gracefully (draining connections), then adds new servers. Bonus points for mentioning blue/green deployments where you flip traffic in the load balancer atomically after the new fleet is fully healthy.