Menu
Medium #system-design·June 6, 2026

Mitigating Cache Stampedes in Distributed Systems

This article explores the common pitfall of cache stampedes when scaling systems, moving beyond a simple cache-aside strategy. It highlights how concurrent requests targeting a cache miss can overwhelm the backend database, leading to performance degradation and potential system failures. The discussion focuses on effective mitigation strategies to ensure system stability under high load.

Read original on Medium #system-design

The Cache Miss → DB → Cache Problem

The basic cache-aside pattern involves checking the cache first, and if a miss occurs, fetching data from the database and then populating the cache. While effective for individual requests, this strategy falls short in distributed, concurrent environments. When multiple clients request the same uncached data simultaneously, they all attempt to fetch it from the backend database, creating a "thundering herd" or cache stampede.

Understanding Cache Stampedes

⚠️

Impact of Cache Stampedes

Cache stampedes can lead to significant performance issues, including database overload, increased latency, resource exhaustion, and potential downtime. This problem is exacerbated by high traffic, cache evictions, or cold starts.

Strategies for Prevention and Mitigation

To prevent cache stampedes, several techniques can be employed to serialize or reduce concurrent database access for the same key. These methods aim to ensure only one request, or a limited number, reaches the backend when a cache miss occurs.

  • Mutex Locks (or Distributed Locks): When a cache miss occurs, the first request acquires a lock for that key. Subsequent requests for the same key block until the lock is released and the cache is populated. This effectively serializes access to the backend data source.
  • Thundering Herd Protection (e.g., using Redis SETNX): A similar approach using a non-blocking mechanism. A worker attempts to set a flag in a distributed store (like Redis SETNX) for the key. If successful, it fetches and caches the data. If not, it waits a short period and retries, hoping the cache is now populated.
  • Probabilistic Early Expiration / Stale-While-Revalidate: Instead of invalidating cache entries immediately, serve slightly stale data while an asynchronous background process fetches fresh data. This minimizes direct impact on user-facing requests.
  • Cache Pre-warming: Proactively load critical data into the cache before it's requested, especially after deployments or planned restarts, to avoid cold cache scenarios.
python
import threading
import time

cache = {}
cache_locks = {}
db_data = {"item1": "data1", "item2": "data2"}

def get_data_from_db(key):
    time.sleep(0.1) # Simulate DB latency
    return db_data.get(key, None)

def get_data(key):
    if key in cache:
        return cache[key]

    # Acquire a lock for this key
    if key not in cache_locks:
        cache_locks[key] = threading.Lock()

    with cache_locks[key]:
        # Double-check if data was cached while waiting for lock
        if key in cache:
            return cache[key]

        print(f"Cache miss for {key}. Fetching from DB...")
        data = get_data_from_db(key)
        if data:
            cache[key] = data
        return data

# Example usage in a concurrent scenario
# threads = []
# for _ in range(5):
#     t = threading.Thread(target=get_data, args=("item1",))
#     threads.append(t)
#     t.start()
# for t in threads:
#     t.join()
cache stampedecachingdistributed cachingconcurrencyperformance optimizationdatabase scalabilitysystem design patternsmutex

Comments

Loading comments...