Course/Caching/Application-Level Caching

Application-Level Caching

In-process caching, request-scoped caching, memoization, and computed caching. When local caches beat distributed ones.

10 min read

The Cache Hierarchy

In a well-designed system there is rarely a single cache layer — there is a hierarchy of caches, each trading off size, latency, and freshness. From fastest to slowest: CPU L1/L2/L3 caches, in-process application memory, distributed cache (Redis/Memcached), CDN edge cache, and finally the origin database or object store. Application-level caching refers to the in-process and request-scoped layers — data stored in the application server's own memory heap.

Loading diagram...

Multi-level cache hierarchy: each level has progressively higher latency but larger capacity

In-Process (Local) Caching

An in-process cache stores data directly in the application server's heap — no network round-trip. Access times are nanoseconds to microseconds rather than the ~1 ms network hop to Redis. This is orders of magnitude faster.

Common implementations: Java's Caffeine (async LW-W cache with near-optimal hit rate), Python's `functools.lru_cache`, Node.js in-memory maps, or framework-specific caches like Rails's `memory_store`. Caffeine is used internally by Netflix, Google, and Spring Boot's default cache.

⚠️

Consistency challenge with in-process caches

If you run 10 application servers, each has its own local cache. After updating a record, you must invalidate the local cache on every server or accept that some servers serve stale data until TTL expiry. This is often fine for short TTLs (seconds), but for longer-lived data, use a distributed cache or a pub/sub invalidation signal (e.g., broadcast via Redis pub/sub).

Memoization

Memoization is function-level caching: the first call to a pure function with given arguments computes and stores the result; subsequent calls with the same arguments return the stored result. It is the simplest form of in-process caching.

python

from functools import lru_cache

@lru_cache(maxsize=1024)
def get_tax_rate(country_code: str, product_category: str) -> float:
    """Expensive lookup from tax rules database. Memoized in-process."""
    return db.query(
        "SELECT rate FROM tax_rules WHERE country=? AND category=?",
        country_code, product_category
    )

# First call: hits DB
rate = get_tax_rate("US", "electronics")  # ~20ms

# Second call: returns from cache
rate = get_tax_rate("US", "electronics")  # ~0.001ms

Python's `lru_cache` is thread-safe and implements LRU eviction with the `maxsize` parameter. In production, prefer a library with TTL support (e.g., `cachetools.TTLCache`) so entries expire and stale data doesn't persist until process restart.

Request-Scoped Caching

Request-scoped caching (also called per-request caching or DataLoader batching) stores data for the lifetime of a single request. It is particularly useful in GraphQL servers and complex service call graphs where the same entity might be fetched multiple times within one request by different resolvers.

typescript

// DataLoader (Facebook's open-source library)
// Batches and deduplicates DB calls within a single request context
import DataLoader from 'dataloader';

const userLoader = new DataLoader(async (ids: string[]) => {
  // Called once per request tick, even if 100 resolvers request users
  const users = await db.query(
    'SELECT * FROM users WHERE id = ANY(?)', [ids]
  );
  // Must return results in same order as ids
  return ids.map(id => users.find(u => u.id === id) ?? null);
});

// Usage in GraphQL resolvers — 100 calls become 1 DB query
const user = await userLoader.load(userId);

Without request-scoped caching, a GraphQL query for 100 posts with their authors might fire 100 individual `SELECT * FROM users WHERE id = ?` queries (the N+1 problem). DataLoader batches these into a single `SELECT * FROM users WHERE id IN (...)` and caches results within the request — transparent to the resolver code.

Computed / Derived Value Caching

Some values are expensive to compute but cheap to store: recommendation scores, aggregated statistics, rendered HTML fragments, or serialized API responses. Computed caching pre-computes these values and stores the result, avoiding recomputation on every request.

Hot path optimization: compute the homepage feed once and cache for 30 seconds rather than assembling it from 50 DB queries on every request
Background refresh: a background job recomputes expensive aggregates every minute and writes to cache; requests always read a pre-computed result
Fragment caching: cache rendered HTML/JSON fragments (e.g., a product card), reducing template rendering overhead

When Local Beats Distributed

Dimension	In-Process Cache	Distributed Cache (Redis)
Latency	Nanoseconds (no network)	~0.3–2 ms (network RTT)
Consistency	Per-server — inconsistent across replicas	Consistent across all app servers
Capacity	Limited to server heap (GBs)	Dedicated nodes (hundreds of GBs)
Fault tolerance	Lost on restart; no replication	Persistent (AOF/RDB); replicated
Best For	Config, lookup tables, per-request dedup	Session data, shared state, distributed locks

💡

Two-tier caching

Combine both layers. Check the in-process cache first (nanoseconds); on miss, check Redis (~1ms); on miss, query the database. Populate both caches on the database read. This two-tier approach is used by Facebook's Memcache architecture (regional cluster + local in-process cache) and is a common pattern in high-performance Java services using Caffeine + Redis.

💡

Interview Tip

Application-level caching is often overlooked by candidates who jump straight to 'add Redis.' Show depth by mentioning the full hierarchy: 'Before adding Redis, I'd check whether in-process caching with a short TTL handles the load — it's zero latency and reduces Redis load. For the N+1 GraphQL problem I'd use DataLoader. Then Redis for shared session state and distributed locking.' This layered thinking signals engineering maturity.

CDN Caching

Cache Invalidation Patterns