Rate limiting at the edge vs application layer: defense in depth

·16 views

we're working on improving our public api rate limiting strategy. currently, we're using a token bucket algorithm at our api gateway, handling about 10k requests per second. the challenge is ensuring consistency across multiple distributed instances of our gateway. if a user hits instance a, then instance b, we need their rate limit to be accurately applied across both. we've also considered a sliding window approach for more fairness. beyond the edge, we also have some application-layer rate limits for specific expensive operations. what's the general consensus on distributed rate limiting these days? are there better algorithms or off-the-shelf solutions that provide robust, consistent rate limiting across multiple nodes without adding too much latency? how do you approach defense-in-depth with rate limiting at both the edge and application layers?

5 comments

Rate limiting at the edge vs application layer: defense in depth

Comments