InfoQ Architecture·May 11, 2026

Netflix's Interval-Aware Caching for Apache Druid

Netflix implemented an interval-aware caching strategy for Apache Druid to optimize real-time analytics queries. This approach significantly boosts cache hit rates and reduces query load by decomposing query results into time-aligned segments, allowing reuse across overlapping rolling window queries. The architectural solution involves an external proxy layer that intercepts, processes, and caches historical data while recomputing only the most recent intervals.

Distributed Systems Performance & Scaling Databases & Storage

Read original on InfoQ Architecture

The Challenge of Rolling Window Queries

Traditional caching mechanisms struggle with rolling window queries common in real-time analytics dashboards. Even with minor shifts in time boundaries (e.g., "errors in the last 3 hours" refreshing every minute), caches treat these as distinct requests. This leads to redundant computation, low cache reuse, and repeated scans over large datasets, especially problematic at Netflix's scale with trillions of rows in Apache Druid.

Interval-Aware Caching Strategy

Netflix's solution decomposes query results into smaller, time-aligned segments. Instead of caching the full query output, intermediate aggregates for fixed time intervals are stored. When a new query arrives, relevant historical segments are retrieved from the cache and merged with newly computed data for only the most recent interval. This strategy maximizes reuse and minimizes computation.

Architectural Components

External Proxy Layer: Intercepts incoming queries before they reach Druid.
Query Decomposition: Separates the query structure from its time intervals.
Cache Key Generation: Generates reusable cache keys based on granular time segments.
Distributed Key-Value Store: Stores cached segments, allowing for independent expiration and efficient retrieval.
Exponential TTL Policies: Applies longer Time-To-Live for older, more stable historical intervals and shorter TTL for recent data to maintain freshness.

ℹ️

Impact and Benefits

This caching layer led to a 33% drop in queries to Druid and a 66% improvement in P90 query times. In some workloads, it achieved up to a 14x reduction in result bytes and substantial reductions in segment scans by Druid. This significantly reduces the load on the underlying analytics database and improves dashboard responsiveness.

Future Directions

Netflix is exploring tighter integration of this interval-aware caching directly into Apache Druid to eliminate the need for an external proxy layer and further improve query planning efficiency. This would streamline the architecture and potentially unlock even greater performance gains by making the cache a first-class citizen within the query engine.

cachingapache druidreal-time analyticsquery optimizationnetflixdistributed cachetime series dataperformance engineering