This article explores the fundamental role of load balancers in horizontally scaled systems, detailing various routing strategies and their trade-offs. It then delves into the critical problem of dynamic server changes in distributed caches and introduces consistent hashing as an elegant solution to minimize key remapping and ensure system resilience.
Read original on Dev.to #systemdesignLoad balancers are essential components in any horizontally scaled system, acting as a single point of contact for incoming requests and intelligently distributing them across multiple backend servers. This abstraction ensures that users perceive a single service while the underlying infrastructure can scale dynamically. They are crucial for improving availability, performance, and fault tolerance.
Different routing strategies offer varying levels of intelligence and overhead:
| Type | What it sees | Best for |
|---|
Load balancers can operate at different network layers. Layer 4 load balancers use IP and TCP information for fast, simple routing. Layer 7 load balancers, conversely, inspect full HTTP content (URLs, headers, cookies) enabling sophisticated, content-aware routing like path-based routing. This allows for requests to be directed to specific microservices or server clusters based on the request's content, forming the backbone of modern microservice architectures.
To mitigate the load balancer itself becoming a single point of failure, active-passive failover is a common strategy. Multiple load balancers are deployed, with one active and others on standby, ready to take over automatically in case of a failure.
Simple hashing schemes (e.g., `hash(key) % N_servers`) suffer from a major flaw: adding or removing a server can cause almost all keys to remap. In systems like distributed caches, this leads to a "cache avalanche" where all requests hit the database, potentially crashing the entire system. Consistent hashing solves this problem by minimizing key reassignments.
The core idea of consistent hashing involves mapping both servers and data keys onto a circular "hash ring." To find a key's server, one walks clockwise from the key's position until a server is encountered. When a server is removed, only the keys it was responsible for (and now flow to the next server clockwise) are affected. All other keys remain undisturbed, greatly reducing the impact of churn in the server pool.
Virtual nodes further enhance consistent hashing. By mapping each physical server to multiple points (virtual nodes) on the hash ring, the distribution of keys becomes more even. This ensures that when a server fails, its load is spread proportionally across *all* remaining servers, rather than just the immediate next one clockwise. This technique is used in highly distributed systems like Amazon DynamoDB and Apache Cassandra.