Distributed Caching
Scaling caches across nodes: consistent hashing, replication, cache clusters, and handling node failures without cache avalanche.
Why Distribute a Cache?
A single cache node has finite RAM. Once your working set exceeds a single machine's memory — or your QPS exceeds what one machine can handle — you need to shard the cache across multiple nodes. Distributed caching adds the full complexity of distributed systems: how do you route requests to the right node? What happens when a node fails? How do you add capacity without invalidating the entire cache?
Naive Sharding: Modulo Hashing
The simplest sharding approach: `node_index = hash(key) % num_nodes`. Every client computes the same formula and routes to the same node. The problem: when you add or remove a node, `num_nodes` changes, and nearly every key maps to a different node. All cached data is effectively invalidated simultaneously — the database gets crushed by a wave of cache misses. This is a cache avalanche.
Cache avalanche from naive resharding
Changing N nodes to N+1 with modulo hashing remaps approximately (N/(N+1)) = ~91% of all keys to different nodes for a 10-node cluster. This is catastrophic for production systems. Consistent hashing solves this by minimizing remapping on topology changes.
Consistent Hashing
Consistent hashing places both nodes and keys on a circular hash ring (0 to 2^32). Each key is assigned to the nearest node clockwise on the ring. When a node is added or removed, only the keys between it and its predecessor on the ring are remapped — approximately `1/N` of all keys for an N-node cluster. This minimizes cache invalidation during scaling.
Virtual nodes (vnodes): A single physical node is represented by multiple positions on the ring (e.g., 150 virtual nodes per physical node). This improves load distribution — without vnodes, an uneven ring means some nodes are responsible for a much larger key range than others. Vnodes also make it easier to assign more ring positions to higher-capacity nodes. Redis Cluster uses 16,384 hash slots (a fixed-size discrete ring) as its consistent hashing mechanism.
Replication in Distributed Caches
Sharding partitions keys across nodes but does not protect against node failure. Replication provides that protection: each primary node has one or more replicas that hold a copy of the same keys. Reads can be served by replicas (increased throughput); writes go to the primary and are asynchronously replicated.
| Replication Model | Write Path | Read Path | Failure Behavior | Example |
|---|---|---|---|---|
| Primary-Replica | Write to primary only | Primary or replicas | Replica promotes to primary on failover (Sentinel) | Redis Sentinel |
| Multi-Primary | Write to any primary | Any node | Partition tolerance; conflict resolution needed | Redis Cluster with multiple primaries |
| Quorum writes | Write to W nodes; succeed when W ack | Read from R nodes | Tunable consistency via W+R > N | Dynamo-style systems |
Cache Avalanche vs Cache Stampede
Two catastrophic distributed cache failure modes are often confused:
| Failure Mode | Cause | Effect | Prevention |
|---|---|---|---|
| Cache Stampede | One popular key expires; many concurrent requests get a miss | Database spiked by N simultaneous queries for one key | Mutex lock, stale-while-revalidate, jittered TTL |
| Cache Avalanche | Many keys expire simultaneously (e.g., after a cold restart, or naive resharding) | Database flooded by massive miss wave across all keys | Staggered TTLs, warm-up strategies, gradual traffic shifting |
| Cache Penetration | Attacker queries keys that never exist (cache miss always) | Database queried for every request; cache offers no protection | Bloom filter to reject invalid keys; null-value caching |
Cache Penetration and Bloom Filters
Cache penetration is a denial-of-service pattern: an attacker queries IDs that don't exist (e.g., `user:-1`, `product:99999999`). Every request is a cache miss and hits the database. The cache provides zero protection. The solution: use a Bloom filter — a probabilistic data structure that quickly answers 'definitely does not exist' or 'probably exists.' A Bloom filter for all valid user IDs can be loaded into the application and checked before any cache or database call.
# Bloom filter check before cache/DB lookup
# Using pybloom_live or similar
from pybloom_live import ScalableBloomFilter
# At startup, populate from DB
user_bloom = ScalableBloomFilter()
for user_id in db.query("SELECT id FROM users"):
user_bloom.add(user_id)
def get_user(user_id: str) -> User:
# Reject impossible IDs without hitting cache or DB
if user_id not in user_bloom:
raise NotFoundError("User does not exist")
# Safe to check cache now
cached = cache.get(f"user:{user_id}")
if cached:
return deserialize(cached)
# Cache miss — DB fetch
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
if user:
cache.set(f"user:{user_id}", serialize(user), ttl=300)
return userHot Key Problem
Consistent hashing distributes keys evenly on average, but some keys receive disproportionate traffic — a celebrity's profile page, a trending news article, or a flash sale product. A single node handling millions of requests per second for one key becomes a hot spot regardless of how evenly other keys are distributed.
Solutions for hot keys:
- Local in-process cache: Put the hot key in every application server's in-process cache. Reads never leave the process — zero network overhead.
- Key replication: Shard the hot key into multiple copies (`product:trending#0`, `product:trending#1` ... `#N`) and distribute them across nodes. Reads pick a random shard.
- Read replicas: Route read traffic for the hot key's primary node to its replicas, distributing the load across multiple physical machines.
- Redis Cluster read-from-replica: Set `READONLY` on replica connections to allow reads, reducing primary load.
Gradual Cache Warm-Up
After a cold start (cluster restart, major resharding, or scaling out), the cache is empty. Sending all production traffic immediately causes a database avalanche. Use gradual warm-up: start by sending a small percentage of traffic to the new cluster while the old cluster handles the rest. As the hit rate climbs (monitor `cache_hits / (cache_hits + cache_misses)`), shift more traffic. Alternatively, pre-warm by replaying recent database reads against the new cluster before shifting traffic.
Interview Tip
Distributed caching questions come up in senior system design interviews for products with large datasets. Cover these four areas: (1) Sharding — consistent hashing to minimize remapping; (2) Replication — primaries + replicas for fault tolerance; (3) Failure modes — cache stampede (one key), cache avalanche (mass expiry), cache penetration (invalid keys); (4) Hot keys — local in-process caching or key replication. Mentioning Bloom filters for cache penetration is a high-signal answer that most candidates don't give.
Practice this pattern
Design a distributed caching system for a social media feed