Medium #system-design·June 6, 2026

Mitigating Cache Stampedes in Distributed Systems

This article explores the common pitfall of cache stampedes when scaling systems, moving beyond a simple cache-aside strategy. It highlights how concurrent requests targeting a cache miss can overwhelm the backend database, leading to performance degradation and potential system failures. The discussion focuses on effective mitigation strategies to ensure system stability under high load.

Distributed Systems Performance & Scaling Databases & Storage

Read original on Medium #system-design

The Cache Miss → DB → Cache Problem

The basic cache-aside pattern involves checking the cache first, and if a miss occurs, fetching data from the database and then populating the cache. While effective for individual requests, this strategy falls short in distributed, concurrent environments. When multiple clients request the same uncached data simultaneously, they all attempt to fetch it from the backend database, creating a "thundering herd" or cache stampede.

Understanding Cache Stampedes

⚠️

Impact of Cache Stampedes

Cache stampedes can lead to significant performance issues, including database overload, increased latency, resource exhaustion, and potential downtime. This problem is exacerbated by high traffic, cache evictions, or cold starts.

Strategies for Prevention and Mitigation

To prevent cache stampedes, several techniques can be employed to serialize or reduce concurrent database access for the same key. These methods aim to ensure only one request, or a limited number, reaches the backend when a cache miss occurs.

Mutex Locks (or Distributed Locks): When a cache miss occurs, the first request acquires a lock for that key. Subsequent requests for the same key block until the lock is released and the cache is populated. This effectively serializes access to the backend data source.
Thundering Herd Protection (e.g., using Redis SETNX): A similar approach using a non-blocking mechanism. A worker attempts to set a flag in a distributed store (like Redis SETNX) for the key. If successful, it fetches and caches the data. If not, it waits a short period and retries, hoping the cache is now populated.
Probabilistic Early Expiration / Stale-While-Revalidate: Instead of invalidating cache entries immediately, serve slightly stale data while an asynchronous background process fetches fresh data. This minimizes direct impact on user-facing requests.
Cache Pre-warming: Proactively load critical data into the cache before it's requested, especially after deployments or planned restarts, to avoid cold cache scenarios.

python

import threading
import time

cache = {}
cache_locks = {}
db_data = {"item1": "data1", "item2": "data2"}

def get_data_from_db(key):
    time.sleep(0.1) # Simulate DB latency
    return db_data.get(key, None)

def get_data(key):
    if key in cache:
        return cache[key]

    # Acquire a lock for this key
    if key not in cache_locks:
        cache_locks[key] = threading.Lock()

    with cache_locks[key]:
        # Double-check if data was cached while waiting for lock
        if key in cache:
            return cache[key]

        print(f"Cache miss for {key}. Fetching from DB...")
        data = get_data_from_db(key)
        if data:
            cache[key] = data
        return data

# Example usage in a concurrent scenario
# threads = []
# for _ in range(5):
#     t = threading.Thread(target=get_data, args=("item1",))
#     threads.append(t)
#     t.start()
# for t in threads:
#     t.join()

cache stampedecachingdistributed cachingconcurrencyperformance optimizationdatabase scalabilitysystem design patternsmutex

Comments

Loading comments...

Architecture Design

Design this yourself

Design a highly available and scalable API service for retrieving product information, ensuring it can handle sudden spikes in traffic and prevent cache stampedes, particularly when a popular product's cache entry expires. Detail the caching strategy, including techniques like distributed locking and stale-while-revalidate, to protect the backend database from thundering herd problems.

Practice Interview

Focus: cache stampede prevention using distributed locks and stale-while-revalidate

Other design angles

· Design a user profile service that leverages caching to serve frequently accessed data, focusing on how to prevent cache stampedes for popular profiles after cache evictions.· Design a real-time analytics dashboard backend that aggregates data from various sources, and explain how to mitigate cache stampedes when generating reports on demand.· Design a news feed service where articles are heavily cached, and describe the mechanisms to handle rapid invalidation and subsequent concurrent requests for newly published or updated articles without overwhelming the underlying data stores.