Menu
Course/Networking & Communication/Load Balancing: L4 vs L7

Load Balancing: L4 vs L7

Layer 4 vs Layer 7 load balancing, algorithms (round-robin, least connections, consistent hashing), health checks, and sticky sessions.

15 min readHigh interview weight

Why Load Balancing?

A single server has a finite capacity. As traffic grows, you need to distribute requests across multiple backend instances. A load balancer sits in front of your server fleet and routes incoming requests, ensuring no single server is overwhelmed, providing redundancy when servers fail, and enabling horizontal scaling.

Loading diagram...
Load balancer distributes requests across a backend server pool

L4 vs L7 Load Balancing

Load balancers operate at different layers of the OSI model. The two most relevant are Layer 4 (Transport) and Layer 7 (Application).

AspectL4 Load BalancerL7 Load Balancer
OSI LayerTransport (TCP/UDP)Application (HTTP/HTTPS/gRPC)
What it seesIP addresses, ports, TCP connection stateFull HTTP headers, URL paths, cookies, request body
Routing basisIP + port onlyURL path, hostname, header values, HTTP method
TLS terminationUsually no (TLS passthrough)Yes — terminates TLS and re-encrypts or sends plaintext to backend
PerformanceVery fast, lower CPU overheadMore CPU (parses HTTP), but smarter routing
Sticky sessionsIP-hash basedCookie-based (more reliable)
Protocol awarenessNone — treats all TCP traffic equallyCan route `/api` to API servers and `/static` to file servers
ExamplesAWS NLB, HAProxy TCP mode, IPVSAWS ALB, NGINX, Envoy, Traefik
💡

When to choose L4 vs L7

Use L4 when you need raw throughput (millions of short TCP connections, non-HTTP protocols like SMTP or game servers). Use L7 when you need content-aware routing, TLS termination, rate limiting, or observability into HTTP traffic. Most modern web applications use L7 load balancers.

Load Balancing Algorithms

How the load balancer selects which backend server to route each request to is determined by its balancing algorithm. Different algorithms suit different workloads.

AlgorithmHow It WorksBest ForWeakness
Round-robinRotate through servers in orderHomogeneous servers, uniform request costIgnores server load or request heaviness
Weighted round-robinRound-robin but servers get more/fewer requests proportional to their weightMixed server capacities (some servers are larger)Still ignores real-time load
Least connectionsRoute to the server with the fewest active connectionsLong-lived connections (WebSockets, file uploads)Connection count != actual load
Least response timeRoute to the server with lowest avg response timeLatency-sensitive APIsRequires active latency tracking
IP hashHash the client IP to pick a consistent serverStateful sessions without session sharingLoad can be uneven; poor for large NAT pools
Consistent hashingHash a key (IP, user ID, URL) onto a hash ringCache-friendly routing, stateful servicesAdding/removing servers requires re-hashing
RandomPick a backend at randomSimple, works surprisingly well at scaleNo load awareness

Health Checks

Load balancers continuously probe backend servers to detect failures. Unhealthy servers are removed from the rotation until they recover.

  • Passive health checks: The load balancer watches for errors on real traffic. If a server returns 5xx errors or TCP resets beyond a threshold, it's marked unhealthy. Zero overhead, but requires real traffic to fail first.
  • Active health checks: The load balancer periodically sends synthetic requests (HTTP GET `/health`, TCP probe, etc.) and checks for expected responses. Catches failures before real traffic is affected.
  • Health check parameters: Interval (how often to probe), timeout (how long to wait), healthy threshold (successful probes to become healthy), unhealthy threshold (failed probes to be removed).
yaml
# Example: AWS ALB target group health check config
HealthCheckProtocol: HTTP
HealthCheckPath: /health
HealthCheckIntervalSeconds: 15
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2      # 2 consecutive successes → healthy
UnhealthyThresholdCount: 3    # 3 consecutive failures → unhealthy
Matcher:
  HttpCode: "200"

Sticky Sessions (Session Affinity)

Some applications store session state in server memory rather than a shared store. Sticky sessions ensure a given client always routes to the same backend server.

  • IP-hash stickiness: Routes all traffic from the same client IP to the same server. Breaks down with large NAT pools (many users sharing one IP) or mobile clients (changing IPs).
  • Cookie-based stickiness: The load balancer sets a cookie (`AWSALB`, `JSESSIONID`) on the first response, then uses it to route subsequent requests to the same server. More reliable than IP-hash.
⚠️

Avoid sticky sessions in new designs

Sticky sessions reduce your ability to redistribute load (a 'hot' server can't shed requests) and cause failures when a server restarts. The better architecture is to externalize session state to a shared store (Redis) so any server can handle any request. This makes your fleet truly stateless.

Connection Draining (Deregistration Delay)

When a backend server is removed (during a deployment or scale-in event), the load balancer enters connection draining mode for that server: it stops sending new requests to it but allows existing in-flight requests to complete within a grace period (typically 30–300 seconds). After the grace period, remaining connections are forcibly closed.

💡

Interview Tip

A classic interview question: 'How do you deploy a new version of your service without dropping requests?' The answer involves connection draining combined with a rolling deployment strategy. The load balancer removes old servers gracefully (draining connections), then adds new servers. Bonus points for mentioning blue/green deployments where you flip traffic in the load balancer atomically after the new fleet is fully healthy.

📝

Knowledge Check

5 questions

Test your understanding of this lesson. Score 70% or higher to complete.

Ask about this lesson

Ask anything about Load Balancing: L4 vs L7