Grab engineers optimized their Android app's image caching by transitioning from a traditional Least Recently Used (LRU) cache to a Time-Aware Least Recently Used (TLRU) cache. This allowed them to reclaim significant storage while maintaining user experience and controlling server costs, demonstrating a practical approach to cache management in mobile applications.
Read original on InfoQ ArchitectureTraditional LRU caches, while effective for many use cases, present specific challenges in mobile applications with large media assets. Grab's Android app used Glide with a 100 MB LRU cache for images. This led to two primary issues: the cache frequently filled up for active users, causing performance degradation, and for less active users, images could persist for months, wasting storage space even if they were no longer relevant after a certain period.
To address the limitations of LRU, Grab implemented a Time-Aware Least Recently Used (TLRU) cache. TLRU extends the LRU eviction policy by incorporating a time-based expiration mechanism, allowing cached items to be evicted not only based on their recency of access but also their age. This hybrid approach helps ensure that stale data doesn't consume valuable storage indefinitely.
Instead of building TLRU from scratch, Grab engineers opted to fork and extend Glide's existing `DiskLruCache` implementation. This decision leveraged a "mature, battle-tested foundation" that already handles complex aspects like crash recovery, thread safety, and performance optimizations. The extensions primarily involved adding last-access time tracking, implementing time-based eviction logic, and devising a migration mechanism for existing LRU caches.
Migration Strategy for Existing Caches
A notable challenge was assigning last-access timestamps to existing LRU entries. Since reliable filesystem timestamps were unavailable, Grab assigned a consistent migration timestamp to all existing entries. This preserves content during migration, though the full benefits of time-based eviction only manifest after one TTL period has passed for these older entries. They also ensured bidirectional compatibility, allowing rollbacks if necessary.