This article highlights that 'rate limiting' is often a misnomer for three distinct, complementary mechanisms critical for system stability: load shedding, true rate limiting, and adaptive throttling. Each layer addresses a different failure mode, protecting either the server itself, the system from abusive callers, or downstream services. Understanding their unique roles and optimal order of execution is crucial for robust system design.
Read original on Dev.to #systemdesignOften conflated under the general term "rate limiting," robust system protection actually involves three distinct and complementary mechanisms. Each serves a unique purpose, protecting different parts of the system from specific failure modes. Building only one or misapplying them can leave critical vulnerabilities that manifest during peak load or degradation events.
| Mechanism | Question it asks | What it protects |
|---|
Load shedding is the first line of defense, designed to protect the server from itself. It's a binary check: is the server healthy enough to process *any* request? This layer operates at the highest priority, before authentication or parsing, by checking simple, cheap indicators like memory pressure, concurrent request counts, or immediate upstream errors. If a server is OOM-ing, load shedding prevents it from wasting resources on requests it cannot fulfill, allowing it to stabilize quickly.
This is the classic rate limiter, focused on protecting the system from abusive or overly demanding callers. It answers the question: "Is *this specific caller* sending too many requests?" Mechanisms like per-user/API key/IP counters, sliding windows, or token buckets are employed here. Rate limiting ensures fair resource distribution and prevents a single bad actor from monopolizing system capacity. It's crucial to distinguish between rejecting requests and delaying them, choosing the appropriate strategy based on whether an external connection is held or if the system can buffer the request internally.
Adaptive throttling protects downstream services from overload originating from the current server. It asks: "Is the *downstream* struggling right now?" By tracking success rates when calling downstream dependencies, this layer can probabilistically drop outbound calls if a downstream service indicates degradation (e.g., high error rates). This proactive measure prevents cascading failures, giving struggling downstream services breathing room to recover without being hammered further.
Order of Operations Matters
Executing load shedding (Layer 1) before rate limiting (Layer 2) is critical for efficiency. Load shedding is cheap and can reject requests instantly if the server is unhealthy, avoiding wasted CPU cycles on expensive rate limit calculations for requests that would be dropped anyway. Think of it as a nightclub: the fire marshal checks building capacity before the bouncer checks individual guest lists.