Menu
Course/Caching/Distributed Caching

Distributed Caching

Scaling caches across nodes: consistent hashing, replication, cache clusters, and handling node failures without cache avalanche.

12 min readHigh interview weight

Why Distribute a Cache?

A single cache node has finite RAM. Once your working set exceeds a single machine's memory โ€” or your QPS exceeds what one machine can handle โ€” you need to shard the cache across multiple nodes. Distributed caching adds the full complexity of distributed systems: how do you route requests to the right node? What happens when a node fails? How do you add capacity without invalidating the entire cache?

Naive Sharding: Modulo Hashing

The simplest sharding approach: `node_index = hash(key) % num_nodes`. Every client computes the same formula and routes to the same node. The problem: when you add or remove a node, `num_nodes` changes, and nearly every key maps to a different node. All cached data is effectively invalidated simultaneously โ€” the database gets crushed by a wave of cache misses. This is a cache avalanche.

โš ๏ธ

Cache avalanche from naive resharding

Changing N nodes to N+1 with modulo hashing remaps approximately (N/(N+1)) = ~91% of all keys to different nodes for a 10-node cluster. This is catastrophic for production systems. Consistent hashing solves this by minimizing remapping on topology changes.

Consistent Hashing

Consistent hashing places both nodes and keys on a circular hash ring (0 to 2^32). Each key is assigned to the nearest node clockwise on the ring. When a node is added or removed, only the keys between it and its predecessor on the ring are remapped โ€” approximately `1/N` of all keys for an N-node cluster. This minimizes cache invalidation during scaling.

Loading diagram...
Consistent hash ring: each node owns keys between its predecessor and itself. Adding a node remaps only ~1/N keys.

Virtual nodes (vnodes): A single physical node is represented by multiple positions on the ring (e.g., 150 virtual nodes per physical node). This improves load distribution โ€” without vnodes, an uneven ring means some nodes are responsible for a much larger key range than others. Vnodes also make it easier to assign more ring positions to higher-capacity nodes. Redis Cluster uses 16,384 hash slots (a fixed-size discrete ring) as its consistent hashing mechanism.

Replication in Distributed Caches

Sharding partitions keys across nodes but does not protect against node failure. Replication provides that protection: each primary node has one or more replicas that hold a copy of the same keys. Reads can be served by replicas (increased throughput); writes go to the primary and are asynchronously replicated.

Replication ModelWrite PathRead PathFailure BehaviorExample
Primary-ReplicaWrite to primary onlyPrimary or replicasReplica promotes to primary on failover (Sentinel)Redis Sentinel
Multi-PrimaryWrite to any primaryAny nodePartition tolerance; conflict resolution neededRedis Cluster with multiple primaries
Quorum writesWrite to W nodes; succeed when W ackRead from R nodesTunable consistency via W+R > NDynamo-style systems

Cache Avalanche vs Cache Stampede

Two catastrophic distributed cache failure modes are often confused:

Failure ModeCauseEffectPrevention
Cache StampedeOne popular key expires; many concurrent requests get a missDatabase spiked by N simultaneous queries for one keyMutex lock, stale-while-revalidate, jittered TTL
Cache AvalancheMany keys expire simultaneously (e.g., after a cold restart, or naive resharding)Database flooded by massive miss wave across all keysStaggered TTLs, warm-up strategies, gradual traffic shifting
Cache PenetrationAttacker queries keys that never exist (cache miss always)Database queried for every request; cache offers no protectionBloom filter to reject invalid keys; null-value caching

Cache Penetration and Bloom Filters

Cache penetration is a denial-of-service pattern: an attacker queries IDs that don't exist (e.g., `user:-1`, `product:99999999`). Every request is a cache miss and hits the database. The cache provides zero protection. The solution: use a Bloom filter โ€” a probabilistic data structure that quickly answers 'definitely does not exist' or 'probably exists.' A Bloom filter for all valid user IDs can be loaded into the application and checked before any cache or database call.

python
# Bloom filter check before cache/DB lookup
# Using pybloom_live or similar
from pybloom_live import ScalableBloomFilter

# At startup, populate from DB
user_bloom = ScalableBloomFilter()
for user_id in db.query("SELECT id FROM users"):
    user_bloom.add(user_id)

def get_user(user_id: str) -> User:
    # Reject impossible IDs without hitting cache or DB
    if user_id not in user_bloom:
        raise NotFoundError("User does not exist")

    # Safe to check cache now
    cached = cache.get(f"user:{user_id}")
    if cached:
        return deserialize(cached)

    # Cache miss โ€” DB fetch
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    if user:
        cache.set(f"user:{user_id}", serialize(user), ttl=300)
    return user

Hot Key Problem

Consistent hashing distributes keys evenly on average, but some keys receive disproportionate traffic โ€” a celebrity's profile page, a trending news article, or a flash sale product. A single node handling millions of requests per second for one key becomes a hot spot regardless of how evenly other keys are distributed.

Solutions for hot keys:

  • Local in-process cache: Put the hot key in every application server's in-process cache. Reads never leave the process โ€” zero network overhead.
  • Key replication: Shard the hot key into multiple copies (`product:trending#0`, `product:trending#1` ... `#N`) and distribute them across nodes. Reads pick a random shard.
  • Read replicas: Route read traffic for the hot key's primary node to its replicas, distributing the load across multiple physical machines.
  • Redis Cluster read-from-replica: Set `READONLY` on replica connections to allow reads, reducing primary load.

Gradual Cache Warm-Up

After a cold start (cluster restart, major resharding, or scaling out), the cache is empty. Sending all production traffic immediately causes a database avalanche. Use gradual warm-up: start by sending a small percentage of traffic to the new cluster while the old cluster handles the rest. As the hit rate climbs (monitor `cache_hits / (cache_hits + cache_misses)`), shift more traffic. Alternatively, pre-warm by replaying recent database reads against the new cluster before shifting traffic.

๐Ÿ’ก

Interview Tip

Distributed caching questions come up in senior system design interviews for products with large datasets. Cover these four areas: (1) Sharding โ€” consistent hashing to minimize remapping; (2) Replication โ€” primaries + replicas for fault tolerance; (3) Failure modes โ€” cache stampede (one key), cache avalanche (mass expiry), cache penetration (invalid keys); (4) Hot keys โ€” local in-process caching or key replication. Mentioning Bloom filters for cache penetration is a high-signal answer that most candidates don't give.

๐Ÿ“

Knowledge Check

5 questions

Test your understanding of this lesson. Score 70% or higher to complete.

Ask about this lesson

Ask anything about Distributed Caching