This article details the pitfalls of naive fixed-window rate limiting and advocates for the token bucket algorithm as a more effective solution. It explains how token buckets provide burst tolerance and smooth traffic outflow, protecting downstream services from thundering herd problems. The author provides a compact Go implementation and contrasts it with a problematic fixed-window approach, highlighting the trade-offs and benefits for system stability.
Read original on Dev.to #systemdesignMany initial attempts at rate limiting use a simple fixed-window counter: requests are counted within a predefined time interval (e.g., one second), and once the limit is reached, subsequent requests are blocked until the window resets. While seemingly intuitive, this approach has a critical flaw: it allows for a "thundering herd" problem at the start of each new window. If a large number of requests arrive just before a window reset, followed by another large number immediately after, the downstream service can experience a burst of traffic far exceeding the intended rate. This can lead to service degradation, increased latency, and cascade failures.
The token bucket algorithm offers a more sophisticated and effective approach to rate limiting. It models a bucket that holds a certain number of "tokens" which are continuously refilled at a fixed rate. Each incoming request consumes one token. If the bucket is empty, the request is rejected or queued. This mechanism provides two key advantages:
package ratelimiter
import (
"sync"
"time"
)
type TokenBucket struct {
rate float64 // tokens added per second
capacity float64 // max tokens the bucket can hold
tokens float64 // current tokens
lastRefill time.Time
mu sync.Mutex // protects tokens and lastRefill
}
func NewTokenBucket(rate float64, capacity float64) *TokenBucket {
return &TokenBucket{
rate: rate,
capacity: capacity,
tokens: capacity, // start full so we can burst initially
lastRefill: time.Now(),
}
}
func (b *TokenBucket) Allow() bool {
b.mu.Lock()
defer b.mu.Unlock()
now := time.Now()
elapsed := now.Sub(b.lastRefill).Seconds()
b.tokens += elapsed * b.rate
if b.tokens > b.capacity {
b.tokens = b.capacity
}
b.lastRefill = now
if b.tokens < 1.0 {
return false
}
b.tokens--
return true
}Distributed Rate Limiting
For distributed rate limiting across multiple service instances, the in-memory state (tokens, lastRefill) can be replaced with an external, atomically managed store like Redis. A Lua script executed via Redis's `EVAL` command can perform the token update logic to ensure atomicity and consistency across nodes.
Adopting a token bucket significantly improves system resilience by providing predictable traffic shaping and preventing overload of downstream dependencies. It shifts the focus from merely counting requests to actively smoothing their arrival rate, making services more robust under varying load conditions.