This article explores the critical need for explicit cache invalidation over simple Time-To-Live (TTL) for data requiring strong read-after-write consistency in distributed systems. It details the "Cache Aside with explicit invalidation" pattern, including its implementation with message queues, and highlights common pitfalls like race conditions and overly broad invalidations. The discussion emphasizes architectural considerations for ensuring data freshness and consistency in high-scale environments.
Read original on Dev.to #systemdesignWhile Time-To-Live (TTL) is a common cache invalidation mechanism, it inherently guarantees stale reads for a duration, which is unacceptable for critical data requiring immediate read-after-write consistency. Relying solely on TTL for sensitive data like user profiles or financial transactions can lead to poor user experience and business impact. This necessitates a shift towards explicit cache invalidation strategies.
The most robust and widely adopted pattern for achieving strong read-after-write consistency is "Cache Aside with explicit invalidation." This pattern ensures that any write operation to the source of truth (typically a database) is immediately followed by the removal or update of the corresponding entry in the cache. The flow is as follows:
In a distributed system, simple local invalidation is insufficient. When multiple application instances and cache nodes are present, invalidation must be propagated across all relevant caches. This is typically achieved using a distributed messaging system like Kafka, RabbitMQ, or Redis Pub/Sub. The writing service publishes an invalidation event, and all subscribing cache nodes or services delete the stale data from their local caches, ensuring eventual consistency for invalidation across the system.
Uber's Profile Service Invalidation
For critical data like a driver's vehicle registration or a rider's payment method, Uber likely uses explicit invalidation. A `Profile Update Service` persists changes to MySQL, then publishes an event to a Kafka topic (e.g., `profile_updates`). A dedicated `Cache Invalidation Service` (or individual services with their own caches) subscribes to this topic and issues `DELETE` commands to their Redis clusters. This ensures data freshness across geographically dispersed data centers within tens of milliseconds.