Application-Level Caching
In-process caching, request-scoped caching, memoization, and computed caching. When local caches beat distributed ones.
The Cache Hierarchy
In a well-designed system there is rarely a single cache layer — there is a hierarchy of caches, each trading off size, latency, and freshness. From fastest to slowest: CPU L1/L2/L3 caches, in-process application memory, distributed cache (Redis/Memcached), CDN edge cache, and finally the origin database or object store. Application-level caching refers to the in-process and request-scoped layers — data stored in the application server's own memory heap.
In-Process (Local) Caching
An in-process cache stores data directly in the application server's heap — no network round-trip. Access times are nanoseconds to microseconds rather than the ~1 ms network hop to Redis. This is orders of magnitude faster.
Common implementations: Java's Caffeine (async LW-W cache with near-optimal hit rate), Python's `functools.lru_cache`, Node.js in-memory maps, or framework-specific caches like Rails's `memory_store`. Caffeine is used internally by Netflix, Google, and Spring Boot's default cache.
Consistency challenge with in-process caches
If you run 10 application servers, each has its own local cache. After updating a record, you must invalidate the local cache on every server or accept that some servers serve stale data until TTL expiry. This is often fine for short TTLs (seconds), but for longer-lived data, use a distributed cache or a pub/sub invalidation signal (e.g., broadcast via Redis pub/sub).
Memoization
Memoization is function-level caching: the first call to a pure function with given arguments computes and stores the result; subsequent calls with the same arguments return the stored result. It is the simplest form of in-process caching.
from functools import lru_cache
@lru_cache(maxsize=1024)
def get_tax_rate(country_code: str, product_category: str) -> float:
"""Expensive lookup from tax rules database. Memoized in-process."""
return db.query(
"SELECT rate FROM tax_rules WHERE country=? AND category=?",
country_code, product_category
)
# First call: hits DB
rate = get_tax_rate("US", "electronics") # ~20ms
# Second call: returns from cache
rate = get_tax_rate("US", "electronics") # ~0.001msPython's `lru_cache` is thread-safe and implements LRU eviction with the `maxsize` parameter. In production, prefer a library with TTL support (e.g., `cachetools.TTLCache`) so entries expire and stale data doesn't persist until process restart.
Request-Scoped Caching
Request-scoped caching (also called per-request caching or DataLoader batching) stores data for the lifetime of a single request. It is particularly useful in GraphQL servers and complex service call graphs where the same entity might be fetched multiple times within one request by different resolvers.
// DataLoader (Facebook's open-source library)
// Batches and deduplicates DB calls within a single request context
import DataLoader from 'dataloader';
const userLoader = new DataLoader(async (ids: string[]) => {
// Called once per request tick, even if 100 resolvers request users
const users = await db.query(
'SELECT * FROM users WHERE id = ANY(?)', [ids]
);
// Must return results in same order as ids
return ids.map(id => users.find(u => u.id === id) ?? null);
});
// Usage in GraphQL resolvers — 100 calls become 1 DB query
const user = await userLoader.load(userId);Without request-scoped caching, a GraphQL query for 100 posts with their authors might fire 100 individual `SELECT * FROM users WHERE id = ?` queries (the N+1 problem). DataLoader batches these into a single `SELECT * FROM users WHERE id IN (...)` and caches results within the request — transparent to the resolver code.
Computed / Derived Value Caching
Some values are expensive to compute but cheap to store: recommendation scores, aggregated statistics, rendered HTML fragments, or serialized API responses. Computed caching pre-computes these values and stores the result, avoiding recomputation on every request.
- Hot path optimization: compute the homepage feed once and cache for 30 seconds rather than assembling it from 50 DB queries on every request
- Background refresh: a background job recomputes expensive aggregates every minute and writes to cache; requests always read a pre-computed result
- Fragment caching: cache rendered HTML/JSON fragments (e.g., a product card), reducing template rendering overhead
When Local Beats Distributed
| Dimension | In-Process Cache | Distributed Cache (Redis) |
|---|---|---|
| Latency | Nanoseconds (no network) | ~0.3–2 ms (network RTT) |
| Consistency | Per-server — inconsistent across replicas | Consistent across all app servers |
| Capacity | Limited to server heap (GBs) | Dedicated nodes (hundreds of GBs) |
| Fault tolerance | Lost on restart; no replication | Persistent (AOF/RDB); replicated |
| Best For | Config, lookup tables, per-request dedup | Session data, shared state, distributed locks |
Two-tier caching
Combine both layers. Check the in-process cache first (nanoseconds); on miss, check Redis (~1ms); on miss, query the database. Populate both caches on the database read. This two-tier approach is used by Facebook's Memcache architecture (regional cluster + local in-process cache) and is a common pattern in high-performance Java services using Caffeine + Redis.
Interview Tip
Application-level caching is often overlooked by candidates who jump straight to 'add Redis.' Show depth by mentioning the full hierarchy: 'Before adding Redis, I'd check whether in-process caching with a short TTL handles the load — it's zero latency and reduces Redis load. For the N+1 GraphQL problem I'd use DataLoader. Then Redis for shared session state and distributed locking.' This layered thinking signals engineering maturity.