Dev.to #systemdesign·July 4, 2026

Decoupling Rate Limiting from Monoliths to Microservices

This article explores the architectural decision to extract a rate limiter from a monolithic application into a dedicated microservice. It highlights the benefits of independent scaling, fault isolation, and faster iteration, contrasting it with the limitations of an in-process monolithic implementation. The piece demonstrates a practical approach using Redis and Lua scripting for atomic rate limiting.

Microservices Distributed Systems Performance & Scaling

Read original on Dev.to #systemdesign

The Challenge: Monolithic Rate Limiting Issues

Embedding a rate limiter directly within a monolithic API service presents several system design challenges. The article describes how such an implementation leads to tight coupling, where changes to the rate limiting logic necessitate a full redeployment of the entire API service, increasing deployment risk and potential for downtime. Performance concerns also arise, as every request, regardless of its rate-limiting needs, incurs the overhead of mutex locks or atomic operations within the same process.

Latency: Each request faces lock contention or atomic increment costs, even if throttling isn't needed for specific endpoints.
Scalability: Scaling traffic requires scaling the entire API, leading to inefficient resource utilization for parts unrelated to rate limits.
Deployment Risk: Modifying the rate limiting algorithm demands a full redeploy, increasing the risk of collateral damage to other critical modules like authentication or payments.

The Solution: A Dedicated Rate Limiter Microservice

The core insight is to decouple the rate limiting concern into its own independent microservice, while maintaining a shared, consistent data store (e.g., Redis). This architectural shift transforms rate limiting from an internal component into an external service that the main API consumes, offering significant advantages in scalability, resilience, and operational agility. The API service simply queries the dedicated rate limiter service, which then handles the complex logic and state management.

💡

Decouple the Concern, Not the Data

The key principle here is to isolate the *logic* and *execution* of rate limiting into a separate service, while allowing that service to manage its state in a globally accessible, shared data store (like Redis). This enables independent scaling and fault isolation without duplicating data or creating complex eventual consistency problems.

Independent Scaling: The rate limiter service can be scaled horizontally based purely on rate limiting traffic, without affecting the main API's scaling needs.
Fault Isolation: Failures in the rate limiter service can be gracefully handled by the API (e.g., open-circuit, allow-through), preventing a single point of failure from crashing the entire system.
Rapid Iteration: New rate limiting algorithms (e.g., sliding window, burst allowances) can be deployed and tested quickly, without a full API redeployment.
Clear Contract: The API interacts with the limiter via a simple interface ("Am I allowed?"), abstracting away the underlying complexity of the rate limiting algorithm.

Implementation with Redis and Lua Scripting

The article demonstrates implementing a token bucket algorithm for the microservice rate limiter using Redis with Lua scripting. This approach ensures atomicity for the token consumption and refill logic, preventing race conditions that could lead to incorrect rate limiting counts in a distributed environment. The Lua script executes server-side on Redis, minimizing network round trips and ensuring that the read-modify-write operations for tokens and timestamps are treated as a single, indivisible transaction.

func AllowHandler(w http.ResponseWriter, r *http.Request) {
  key := r.URL.Query().Get("key")
  cost, _ := strconv.Atoi(r.URL.Query().Get("cost"))

  script := ` local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local fill_rate = tonumber(ARGV[2]) // tokens per second
    local now = tonumber(ARGV[3])
    local cost = tonumber(ARGV[4])

    local data = redis.call('HMGET', key, 'tokens', 'last')
    local tokens = tonumber(data[1]) or capacity
    local last = tonumber(data[2]) or now

    local delta = (now - last) * fill_rate
    if delta > 0 then
      tokens = math.min(capacity, tokens + delta)
      last = now
    end

    if tokens >= cost then
      tokens = tokens - cost
      redis.call('HMSET', key, 'tokens', tokens, 'last', last)
      redis.call('EXPIRE', key, 3600) // optional TTL
      return 1
    else
      redis.call('HMSET', key, 'tokens', tokens, 'last', last)
      return 0
    end `

  result, err := redisClient.Eval(ctx, script, []string{key}, capacity, fillRate, float64(time.Now().UnixNano()/1e9), cost)
  // ... handle result and error
}

rate limitingmicroservicesmonolithdecouplingredisgosystem design patternsapi gateway

Comments

Loading comments...

Architecture Design

Design this yourself

Design an API Gateway service that includes a robust, distributed rate limiting component. The rate limiter should support a token bucket algorithm, be independently scalable, provide fault isolation, and ensure atomic operations for token consumption and refill, potentially leveraging Redis with Lua scripting. Detail the interaction between the API Gateway and the rate limiting service, including considerations for latency, graceful degradation, and configuration management.

Practice Interview

Focus: distributed rate limiter using Redis and Lua scripting with token bucket algorithm

Other design angles

· Design a multi-tenant SaaS platform where each tenant requires customizable and independently enforced rate limits for their API usage.· Design a standalone, highly-performant rate limiting service that can be integrated with various upstream applications using a gRPC API, focusing on its internal architecture, data store choices, and consistency models.· Compare and contrast the token bucket and sliding window counter algorithms for distributed rate limiting, outlining their implementation complexities, trade-offs, and ideal use cases within a high-traffic API ecosystem.