Dev.to #systemdesign·June 3, 2026

Implementing an Effective Token Bucket Rate Limiter

This article details the pitfalls of naive fixed-window rate limiting and advocates for the token bucket algorithm as a more effective solution. It explains how token buckets provide burst tolerance and smooth traffic outflow, protecting downstream services from thundering herd problems. The author provides a compact Go implementation and contrasts it with a problematic fixed-window approach, highlighting the trade-offs and benefits for system stability.

Distributed Systems API Design Performance & Scaling

Read original on Dev.to #systemdesign

The Problem with Fixed-Window Rate Limiters

Many initial attempts at rate limiting use a simple fixed-window counter: requests are counted within a predefined time interval (e.g., one second), and once the limit is reached, subsequent requests are blocked until the window resets. While seemingly intuitive, this approach has a critical flaw: it allows for a "thundering herd" problem at the start of each new window. If a large number of requests arrive just before a window reset, followed by another large number immediately after, the downstream service can experience a burst of traffic far exceeding the intended rate. This can lead to service degradation, increased latency, and cascade failures.

The Token Bucket Algorithm

The token bucket algorithm offers a more sophisticated and effective approach to rate limiting. It models a bucket that holds a certain number of "tokens" which are continuously refilled at a fixed rate. Each incoming request consumes one token. If the bucket is empty, the request is rejected or queued. This mechanism provides two key advantages:

Burst Tolerance: During idle periods, tokens accumulate in the bucket up to its maximum capacity. This allows the system to handle short, legitimate bursts of requests without immediately hitting rate limits.
Smooth Outflow: The rate at which tokens are consumed (and thus requests are processed) is smoothed over time, never exceeding the refill rate. This protects downstream systems from sudden spikes in traffic, ensuring a more consistent load.

package ratelimiter
import (
	"sync"
	"time"
)
type TokenBucket struct {
	rate float64 // tokens added per second
	capacity float64 // max tokens the bucket can hold
	tokens float64 // current tokens
	lastRefill time.Time
	mu sync.Mutex // protects tokens and lastRefill
}
func NewTokenBucket(rate float64, capacity float64) *TokenBucket {
	return &TokenBucket{
		rate: rate,
		capacity: capacity,
		tokens: capacity, // start full so we can burst initially
		lastRefill: time.Now(),
	}
}
func (b *TokenBucket) Allow() bool {
	b.mu.Lock()
	defer b.mu.Unlock()
	now := time.Now()
	elapsed := now.Sub(b.lastRefill).Seconds()
	b.tokens += elapsed * b.rate
	if b.tokens > b.capacity {
		b.tokens = b.capacity
	}
	b.lastRefill = now
	if b.tokens < 1.0 {
		return false
	}
	b.tokens--
	return true
}

💡

Distributed Rate Limiting

For distributed rate limiting across multiple service instances, the in-memory state (tokens, lastRefill) can be replaced with an external, atomically managed store like Redis. A Lua script executed via Redis's `EVAL` command can perform the token update logic to ensure atomicity and consistency across nodes.

Key Considerations and Trade-offs

State Management: The token bucket requires maintaining a small amount of state (current tokens, last refill time). For single instances, this is simple memory and a mutex. For distributed systems, an external state store (e.g., Redis, DynamoDB) and atomic operations are necessary.
Concurrency: Proper synchronization (e.g., mutexes) is crucial for thread-safe token updates in concurrent environments to prevent race conditions and over-allowance.
Configurability: The `rate` (tokens per second) and `capacity` (max burst) parameters allow fine-grained control over traffic shaping, enabling systems to balance strict limits with responsiveness to legitimate bursts.

Adopting a token bucket significantly improves system resilience by providing predictable traffic shaping and preventing overload of downstream dependencies. It shifts the focus from merely counting requests to actively smoothing their arrival rate, making services more robust under varying load conditions.

rate limitingtoken bucketconcurrencydistributed systemstraffic shapingGoAPI gatewaysystem resilience

Comments

Loading comments...

Architecture Design

Design this yourself

Design an API gateway component that implements a robust, distributed rate limiting mechanism using the token bucket algorithm. The system should support per-client rate limits, gracefully handle bursts, protect downstream services, and be highly available and fault-tolerant.

Practice Interview

Focus: token bucket rate limiter

Other design angles

· Design a system for protecting third-party APIs from overload using a client-side token bucket rate limiter, including considerations for back-off and retry strategies.· Design a streaming data processing pipeline and integrate a dynamic rate limiter based on the token bucket algorithm, where the refill rate can adjust based on real-time processing capacity or back-pressure signals.