Menu
Back to Discussions

Thundering herd problem: our cache expired and 10K requests hit the database simultaneously

Julia Brown
Julia Brown
·408 views
we experienced a pretty significant outage recently due to what i'd call a 'thundering herd' problem. a highly popular item's cache key expired, and simultaneously, thousands of concurrent requests all hit the underlying database to regenerate that same piece of data. the database got hammered and became unresponsive, leading to cascading failures. we've discussed a few mitigation strategies: using a mutex to ensure only one request regenerates the cache, probabilistic early expiration to spread out regeneration, or employing a 'stale-while-revalidate' approach. for those who've tackled this, what patterns have you found most effective in preventing cache stampedes, especially for critical, high-traffic data? are there trade-offs in terms of latency versus database load?
0 comments

Comments

Sign in to join the conversation.

Loading comments...