This article explores the architectural decision to extract a rate limiter from a monolithic application into a dedicated microservice. It highlights the benefits of independent scaling, fault isolation, and faster iteration, contrasting it with the limitations of an in-process monolithic implementation. The piece demonstrates a practical approach using Redis and Lua scripting for atomic rate limiting.
Read original on Dev.to #systemdesignEmbedding a rate limiter directly within a monolithic API service presents several system design challenges. The article describes how such an implementation leads to tight coupling, where changes to the rate limiting logic necessitate a full redeployment of the entire API service, increasing deployment risk and potential for downtime. Performance concerns also arise, as every request, regardless of its rate-limiting needs, incurs the overhead of mutex locks or atomic operations within the same process.
The core insight is to decouple the rate limiting concern into its own independent microservice, while maintaining a shared, consistent data store (e.g., Redis). This architectural shift transforms rate limiting from an internal component into an external service that the main API consumes, offering significant advantages in scalability, resilience, and operational agility. The API service simply queries the dedicated rate limiter service, which then handles the complex logic and state management.
Decouple the Concern, Not the Data
The key principle here is to isolate the *logic* and *execution* of rate limiting into a separate service, while allowing that service to manage its state in a globally accessible, shared data store (like Redis). This enables independent scaling and fault isolation without duplicating data or creating complex eventual consistency problems.
The article demonstrates implementing a token bucket algorithm for the microservice rate limiter using Redis with Lua scripting. This approach ensures atomicity for the token consumption and refill logic, preventing race conditions that could lead to incorrect rate limiting counts in a distributed environment. The Lua script executes server-side on Redis, minimizing network round trips and ensuring that the read-modify-write operations for tokens and timestamps are treated as a single, indivisible transaction.
func AllowHandler(w http.ResponseWriter, r *http.Request) {
key := r.URL.Query().Get("key")
cost, _ := strconv.Atoi(r.URL.Query().Get("cost"))
script := ` local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local fill_rate = tonumber(ARGV[2]) // tokens per second
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local data = redis.call('HMGET', key, 'tokens', 'last')
local tokens = tonumber(data[1]) or capacity
local last = tonumber(data[2]) or now
local delta = (now - last) * fill_rate
if delta > 0 then
tokens = math.min(capacity, tokens + delta)
last = now
end
if tokens >= cost then
tokens = tokens - cost
redis.call('HMSET', key, 'tokens', tokens, 'last', last)
redis.call('EXPIRE', key, 3600) // optional TTL
return 1
else
redis.call('HMSET', key, 'tokens', tokens, 'last', last)
return 0
end `
result, err := redisClient.Eval(ctx, script, []string{key}, capacity, fillRate, float64(time.Now().UnixNano()/1e9), cost)
// ... handle result and error
}