Dev.to #systemdesign·March 25, 2026

Designing a Two-Level Caching Strategy with In-Process and Distributed Caches

This article details the implementation of a two-level caching architecture to achieve sub-millisecond lookup times for high-traffic systems. It explains the integration of an in-process cache (Caffeine) for L1 and a distributed cache (Redis) for L2, with a persistent data store (Elasticsearch) as the source of truth, addressing the limitations of single-layer caching.

Performance & Scaling Databases & Storage Distributed Systems

Read original on Dev.to #systemdesign

The Challenge: Latency at Scale

High-traffic systems frequently encounter performance bottlenecks when relying solely on a persistent data store. Even with optimized databases like Elasticsearch, tail latencies can escalate under heavy load, severely impacting user experience. The article highlights how a product lookup exceeding 80ms prompted the need for a more aggressive caching strategy to keep response times under control.

Limitations of Single-Layer Caching

Relying on a single caching layer, whether in-process or distributed, presents distinct trade-offs:

Distributed Caches (e.g., Redis): Offer shared state across instances and persistence across application restarts. However, they incur network latency (1-5ms per hop), which can accumulate in high-volume scenarios.
In-Process Caches (e.g., Caffeine): Provide ultra-low latency (sub-millisecond) reads as data resides in the application's heap. Their drawbacks include lack of shared state between service instances, data loss on restarts, and potential heap memory issues if not carefully managed.

💡

Why Two Levels?

A multi-level caching strategy combines the strengths of different cache types. The goal is to serve the hottest data from the fastest, closest cache, falling back to progressively slower but more resilient layers.

The Two-Level Cache Architecture

The proposed architecture implements a cache-aside pattern across three layers: an L1 in-process cache, an L2 distributed cache, and a persistent data store (Elasticsearch). The lookup flow prioritizes speed and efficiency:

L1 (Caffeine - In-Process Cache): First check. If found, return immediately (~0.1ms). No network overhead or serialization.
L2 (Redis - Distributed Cache): On an L1 miss, check Redis. If found, return the value and asynchronously backfill L1 to prime it for subsequent requests.
L3 (Elasticsearch - Source of Truth): On an L2 miss, query Elasticsearch. The result is then written back to both L2 (Redis) and L1 (Caffeine) before being returned to the caller.

Caffeine for L1: Optimizing Hot Data

Caffeine, a high-performance in-process cache for the JVM, is chosen for L1 due to its speed and efficient eviction policies (W-TinyLFU). Key features utilized include:

Time-based expiry: Configured with short TTLs (e.g., 30-60 seconds) to ensure freshness.
Size-based eviction: Bounded by entry count or byte weight to prevent excessive heap consumption.
Asynchronous loading: Prevents multiple threads from blocking on a cold cache key.
Monitoring: Integration with Micrometer allows easy exposure of hit/miss rates to monitoring systems like Prometheus.

java

// CaffeineConfig.java@Beanpublic Cache<String, Product> caffeineProductCache() {return Caffeine.newBuilder().expireAfterWrite(Duration.ofSeconds(ttlSeconds)) // default 30 s.maximumSize(maxSize) // default 5 000.recordStats() // exposes hit rate to Micrometer / Prometheus.build();}

Redis for L2: Bridging Persistence and Distribution

Redis acts as the intermediary, providing a shared cache across service instances and surviving application restarts. It absorbs load spikes when new application instances come online with cold Caffeine caches. Important considerations for Redis:

Longer TTLs: Configured for minutes (e.g., 5-15 minutes) depending on data volatility.
Efficient Serialization: Using compact binary formats like MessagePack instead of JSON reduces memory footprint and deserialization overhead.
Managed Service: Leveraging cloud-managed Redis services (e.g., GCP Memorystore) simplifies operations and provides high availability.

java

// RedisConfig.java — MessagePack gives ~32% smaller payloads vs JSON@Beanpublic ObjectMapper msgpackObjectMapper() {return new ObjectMapper(new MessagePackFactory()).registerModule(new JavaTimeModule()).disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS);}@Beanpublic RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory connectionFactory, ObjectMapper msgpackObjectMapper) {var serializer = new Jackson2JsonRedisSerializer<>(msgpackObjectMapper, Object.class);var template = new RedisTemplate<String, Object>();template.setConnectionFactory(connectionFactory);template.setKeySerializer(new StringRedisSerializer());template.setValueSerializer(serializer);template.afterPropertiesSet();return template;}

cachingtwo-level cacheCaffeineRedisElasticsearchlatency reductionsystem designperformance optimization