InfoQ Architecture·March 25, 2026

Architecting Multi-Layer Defenses for System Resilience and Traffic Spikes

This article discusses strategies for building resilient systems capable of handling significant traffic spikes, drawing insights from SeatGeek's architecture. It introduces a multi-layer defense approach—Edge Shield, Gateway Shield, and Platform Shield—to absorb bursts, control flow, and protect core services. The discussion highlights the importance of early signals, graceful degradation, and resource isolation to prevent system collapse.

Performance & Scaling Distributed Systems API Design

Read original on InfoQ Architecture

The article

The "Traffic Stampede" Problem

SeatGeek operates in an environment prone to "traffic stampedes" where demand arrives faster than a system can adapt, leading to potential collapse. Key indicators of system stress include the "Noisy Neighbor Problem" in multi-tenant systems and the "Scaling Gap," the period where scaling mechanisms lag behind demand. The core strategy to address this is a threefold approach: Absorb the Burst, Control the Flow, and Protect the Core.

Multi-Layer Defense Strategy

SeatGeek employs a multi-shield defense system, each layer with distinct responsibilities to ensure resilience:

Edge Shield: Focused on the outermost layer, it includes a Cache for serving requests without hitting the origin, a Queue to absorb sudden traffic bursts, and a Filter to detect bots and invalid traffic. Combining caching with rate limiting at this layer improves stability and reduces origin load. A Virtual Waiting Room is also used here to manage traffic flow.
Gateway Shield: This layer is responsible for controlling the rate of requests, ensuring fair access for legitimate users, and validating traffic. Rate Limiting, often implemented with a Rate Limit Gate (e.g., HTTP 429 responses), differentiates between human users and sophisticated automated agents. Fair access policies are applied per user/account or API key.
Platform Shield: The innermost layer focuses on protecting core services through Resource Isolation (CPU limits, scheduling priorities to prevent noisy neighbors), Prioritization of critical paths, and Observability Signals. Monitoring queues and CPU saturation provides early indicators for horizontal scaling (e.g., HPA invocation).

💡

Resilience Principles

A resilient system is built on four core principles: Composition (resilience is layered), Protect the Core (preserve critical paths), Observe Pressure (signals reveal stress), and Controlled Failure (fail gracefully). Early and accurate signals are crucial for faster system adaptation and preventing collapse.

The flow of signals and scaling involves a spike in traffic leading to an increase in queue size, which triggers scaling mechanisms (like HPA) to increase capacity, ultimately reducing the queue size. Signals from all three defense layers contribute to a faster response.

ResilienceTraffic SpikesRate LimitingCachingQueuingSystem ArchitectureMulti-tenancyScaling

Comments

Loading comments...

Architecture Design

Design this yourself

Design a highly resilient, scalable ticketing platform capable of handling extreme traffic spikes and "traffic stampedes." Your design should incorporate a multi-layer defense strategy including an Edge Shield (with caching, queues, bot filtering), a Gateway Shield (with fair access rate limiting per user/API key), and a Platform Shield (with resource isolation, critical path prioritization, and dynamic scaling triggered by observability signals). Focus on how these layers interact to absorb bursts, control flow, and protect core services from cascading failures.

Practice Interview

Other design angles

· Design a distributed rate limiting service that can be integrated into an existing API Gateway, focusing on implementing fair access policies and handling high throughput.· Architect a multi-tenant platform, detailing how to implement resource isolation and prevent the "Noisy Neighbor Problem" while ensuring high availability during peak loads.· Design a real-time observability and auto-scaling system that leverages early signals from various system layers to react quickly to sudden traffic increases and prevent system collapse.

Architecting Multi-Layer Defenses for System Resilience and Traffic Spikes

The "Traffic Stampede" Problem

Multi-Layer Defense Strategy

Comments

Architecture Design

Related Lessons