Menu
InfoQ Architecture·March 25, 2026

Architecting Multi-Layer Defenses for System Resilience and Traffic Spikes

This article discusses strategies for building resilient systems capable of handling significant traffic spikes, drawing insights from SeatGeek's architecture. It introduces a multi-layer defense approach—Edge Shield, Gateway Shield, and Platform Shield—to absorb bursts, control flow, and protect core services. The discussion highlights the importance of early signals, graceful degradation, and resource isolation to prevent system collapse.

Read original on InfoQ Architecture

The article

The "Traffic Stampede" Problem

SeatGeek operates in an environment prone to "traffic stampedes" where demand arrives faster than a system can adapt, leading to potential collapse. Key indicators of system stress include the "Noisy Neighbor Problem" in multi-tenant systems and the "Scaling Gap," the period where scaling mechanisms lag behind demand. The core strategy to address this is a threefold approach: Absorb the Burst, Control the Flow, and Protect the Core.

Multi-Layer Defense Strategy

SeatGeek employs a multi-shield defense system, each layer with distinct responsibilities to ensure resilience:

  • Edge Shield: Focused on the outermost layer, it includes a Cache for serving requests without hitting the origin, a Queue to absorb sudden traffic bursts, and a Filter to detect bots and invalid traffic. Combining caching with rate limiting at this layer improves stability and reduces origin load. A Virtual Waiting Room is also used here to manage traffic flow.
  • Gateway Shield: This layer is responsible for controlling the rate of requests, ensuring fair access for legitimate users, and validating traffic. Rate Limiting, often implemented with a Rate Limit Gate (e.g., HTTP 429 responses), differentiates between human users and sophisticated automated agents. Fair access policies are applied per user/account or API key.
  • Platform Shield: The innermost layer focuses on protecting core services through Resource Isolation (CPU limits, scheduling priorities to prevent noisy neighbors), Prioritization of critical paths, and Observability Signals. Monitoring queues and CPU saturation provides early indicators for horizontal scaling (e.g., HPA invocation).
💡

Resilience Principles

A resilient system is built on four core principles: Composition (resilience is layered), Protect the Core (preserve critical paths), Observe Pressure (signals reveal stress), and Controlled Failure (fail gracefully). Early and accurate signals are crucial for faster system adaptation and preventing collapse.

The flow of signals and scaling involves a spike in traffic leading to an increase in queue size, which triggers scaling mechanisms (like HPA) to increase capacity, ultimately reducing the queue size. Signals from all three defense layers contribute to a faster response.

ResilienceTraffic SpikesRate LimitingCachingQueuingSystem ArchitectureMulti-tenancyScaling

Comments

Loading comments...