Dev.to #architecture·June 14, 2026

Designing a Distributed Rate Limiter with CAP Theorem Considerations

This article explores the challenges of building a distributed rate limiter, particularly during network partitions. It highlights how a naive centralized Redis-based limiter can fail due to prioritizing consistency over availability, leading to service outages. The author proposes an AP (Availability and Partition Tolerance) approach using local token buckets and a gossip protocol for eventual consistency, ensuring the service remains available even during network disruptions.

Distributed Systems Performance & Scaling API Design

Read original on Dev.to #architecture

The Challenge with Centralized Rate Limiting

The article begins with a common scenario: a microservice powering a flash-sale site experiencing traffic spikes. An initial attempt at rate limiting using a single Redis counter failed catastrophically during a network hiccup. The Redis node became unreachable, leading the rate limiter to deny all requests, effectively causing a total service outage. This illustrates the critical problem with highly consistent, centralized components in distributed systems when facing network partitions.

package main
import (
	"github.com/go-redis/redis/v8"
	"time"
)

var rdb = redis.NewClient(&redis.Options{Addr: "redis:6379"})

func allow(userID string) bool {
	key := fmt.Sprintf("rl:%s", userID)
	// INCR and EXPIRE are a single atomic script (Lua)
	script := redis.NewScript(` local val = redis.call('INCR', KEYS[1]) if val == 1 then redis.call('EXPIRE', KEYS[1], ARGV[1]) end return val `)
	count, _ := script.Run(ctx, rdb, []string{key}, strconv.Itoa(limit)).Int()
	return count <= limit
}

This Go example demonstrates a naive Redis-backed rate limiter. The issue here is that the `INCR` operation blocks until Redis responds. If Redis is unreachable due to a network partition, the operation times out, leading the limiter to deny requests, thus sacrificing Availability (A) in favor of Consistency (C) and Partition Tolerance (P) (CP). While correctness is maintained, the service becomes unusable.

Applying the CAP Theorem to Rate Limiters

The core insight is that CAP theorem is not a one-time choice but a continuous lens for system design. For most public-facing APIs, an occasional overflow of requests during a partition is less damaging than a complete service outage. Therefore, the optimal choice for a distributed rate limiter is often to prioritize Availability (A) and Partition Tolerance (P), accepting eventual consistency for the rate counters.

💡

CAP Theorem for Rate Limiters

For critical user-facing services, preferring Availability (A) and Partition Tolerance (P) (AP) over strong Consistency (C) for rate limiting ensures the service remains operational during network partitions, even if it allows a few extra requests. A total outage caused by a strict CP limiter is often far more detrimental than a slight overage of requests.

An AP Design: Local Token Buckets with Gossip

The proposed solution leverages local token buckets on each service instance combined with a lightweight gossip protocol for synchronization. Each instance maintains its own token bucket, allowing it to respond immediately to requests without blocking on remote calls. This inherently provides high availability locally. To achieve eventual consistency and overall rate limiting, the state of these local buckets is periodically 'gossiped' between instances via a push-pull mechanism.

type Bucket struct {
	Capacity   int64
	Tokens     int64
	LastRefill time.Time
	Rate       time.Duration // refill interval per token
	mu         sync.Mutex
}

func NewBucket(capacity int64, rate time.Duration) *Bucket {
	return &Bucket{
		Capacity:   capacity,
		Tokens:     capacity,
		LastRefill: time.Now(),
		Rate:       rate,
	}
}

func (b *Bucket) Allow() bool {
	b.mu.Lock()
	defer b.mu.Unlock()

	now := time.Now()
	// refill based on elapsed time
	elapsed := now.Sub(b.LastRefill)
	add := int64(elapsed / b.Rate)

	if add > 0 {
		b.Tokens += add
		if b.Tokens > b.Capacity {
			b.Tokens = b.Capacity
		}
		b.LastRefill = now
	}

	if b.Tokens > 0 {
		b.Tokens--
		return true
	}
	return false
}

Local Token Bucket: Each service instance independently manages its own token bucket, refilling tokens based on elapsed time. This ensures immediate responses and high availability even if other nodes or the central store are down.
Gossip Protocol: A simple push-pull gossip mechanism periodically exchanges bucket state between nodes. This provides eventual consistency across the cluster without a single point of failure. During a partition, nodes continue to operate using their local buckets, potentially allowing slight overages, but maintaining service availability.

rate limitingCAP theoremdistributed systemsavailabilitypartition toleranceeventual consistencytoken bucketgossip protocol

Comments

Loading comments...

Architecture Design

Design this yourself

Design a highly available and partition-tolerant distributed rate limiting system for a public-facing API gateway that supports millions of requests per second. The system should prioritize availability during network partitions over strict real-time consistency, allowing for slight rate limit overages in exchange for continuous service. Detail the architecture using local token buckets on each API gateway instance and a gossip-based synchronization mechanism for eventual consistency, including trade-offs and failure scenarios.

Practice Interview

Focus: distributed rate limiter with eventual consistency via local token buckets and gossip protocol

Other design angles

· Design a strictly consistent distributed rate limiter for a financial transaction system, outlining how it handles partitions and the implications for availability.· Design a multi-tenant SaaS platform where each tenant has configurable rate limits, focusing on isolation, fairness, and dynamic configuration updates.· Compare and contrast different rate limiting algorithms (e.g., sliding window log, sliding window counter, leaky bucket) in the context of a high-throughput, low-latency microservice architecture.