Course/Networking & Communication/Load Balancing: L4 vs L7

Load Balancing: L4 vs L7

Layer 4 vs Layer 7 load balancing, algorithms (round-robin, least connections, consistent hashing), health checks, and sticky sessions.

15 min readHigh interview weight

Why Load Balancing?

A single server has a finite capacity. As traffic grows, you need to distribute requests across multiple backend instances. A load balancer sits in front of your server fleet and routes incoming requests, ensuring no single server is overwhelmed, providing redundancy when servers fail, and enabling horizontal scaling.

Loading diagram...

Load balancer distributes requests across a backend server pool

L4 vs L7 Load Balancing

Load balancers operate at different layers of the OSI model. The two most relevant are Layer 4 (Transport) and Layer 7 (Application).

Aspect	L4 Load Balancer	L7 Load Balancer
OSI Layer	Transport (TCP/UDP)	Application (HTTP/HTTPS/gRPC)
What it sees	IP addresses, ports, TCP connection state	Full HTTP headers, URL paths, cookies, request body
Routing basis	IP + port only	URL path, hostname, header values, HTTP method
TLS termination	Usually no (TLS passthrough)	Yes — terminates TLS and re-encrypts or sends plaintext to backend
Performance	Very fast, lower CPU overhead	More CPU (parses HTTP), but smarter routing
Sticky sessions	IP-hash based	Cookie-based (more reliable)
Protocol awareness	None — treats all TCP traffic equally	Can route `/api` to API servers and `/static` to file servers
Examples	AWS NLB, HAProxy TCP mode, IPVS	AWS ALB, NGINX, Envoy, Traefik

💡

When to choose L4 vs L7

Use L4 when you need raw throughput (millions of short TCP connections, non-HTTP protocols like SMTP or game servers). Use L7 when you need content-aware routing, TLS termination, rate limiting, or observability into HTTP traffic. Most modern web applications use L7 load balancers.

Load Balancing Algorithms

How the load balancer selects which backend server to route each request to is determined by its balancing algorithm. Different algorithms suit different workloads.

Algorithm	How It Works	Best For	Weakness
Round-robin	Rotate through servers in order	Homogeneous servers, uniform request cost	Ignores server load or request heaviness
Weighted round-robin	Round-robin but servers get more/fewer requests proportional to their weight	Mixed server capacities (some servers are larger)	Still ignores real-time load
Least connections	Route to the server with the fewest active connections	Long-lived connections (WebSockets, file uploads)	Connection count != actual load
Least response time	Route to the server with lowest avg response time	Latency-sensitive APIs	Requires active latency tracking
IP hash	Hash the client IP to pick a consistent server	Stateful sessions without session sharing	Load can be uneven; poor for large NAT pools
Consistent hashing	Hash a key (IP, user ID, URL) onto a hash ring	Cache-friendly routing, stateful services	Adding/removing servers requires re-hashing
Random	Pick a backend at random	Simple, works surprisingly well at scale	No load awareness

Health Checks

Load balancers continuously probe backend servers to detect failures. Unhealthy servers are removed from the rotation until they recover.

Passive health checks: The load balancer watches for errors on real traffic. If a server returns 5xx errors or TCP resets beyond a threshold, it's marked unhealthy. Zero overhead, but requires real traffic to fail first.
Active health checks: The load balancer periodically sends synthetic requests (HTTP GET `/health`, TCP probe, etc.) and checks for expected responses. Catches failures before real traffic is affected.
Health check parameters: Interval (how often to probe), timeout (how long to wait), healthy threshold (successful probes to become healthy), unhealthy threshold (failed probes to be removed).

yaml

# Example: AWS ALB target group health check config
HealthCheckProtocol: HTTP
HealthCheckPath: /health
HealthCheckIntervalSeconds: 15
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2      # 2 consecutive successes → healthy
UnhealthyThresholdCount: 3    # 3 consecutive failures → unhealthy
Matcher:
  HttpCode: "200"

Sticky Sessions (Session Affinity)

Some applications store session state in server memory rather than a shared store. Sticky sessions ensure a given client always routes to the same backend server.

IP-hash stickiness: Routes all traffic from the same client IP to the same server. Breaks down with large NAT pools (many users sharing one IP) or mobile clients (changing IPs).
Cookie-based stickiness: The load balancer sets a cookie (`AWSALB`, `JSESSIONID`) on the first response, then uses it to route subsequent requests to the same server. More reliable than IP-hash.

⚠️

Avoid sticky sessions in new designs

Sticky sessions reduce your ability to redistribute load (a 'hot' server can't shed requests) and cause failures when a server restarts. The better architecture is to externalize session state to a shared store (Redis) so any server can handle any request. This makes your fleet truly stateless.

Connection Draining (Deregistration Delay)

When a backend server is removed (during a deployment or scale-in event), the load balancer enters connection draining mode for that server: it stops sending new requests to it but allows existing in-flight requests to complete within a grace period (typically 30–300 seconds). After the grace period, remaining connections are forcibly closed.

💡

Interview Tip

A classic interview question: 'How do you deploy a new version of your service without dropping requests?' The answer involves connection draining combined with a rolling deployment strategy. The load balancer removes old servers gracefully (draining connections), then adds new servers. Bonus points for mentioning blue/green deployments where you flip traffic in the load balancer atomically after the new fleet is fully healthy.

Content Delivery Networks (CDN)

Reverse Proxy