This article highlights a common mistake in system design interviews and real-world planning: focusing solely on daily transaction volume. It argues that designing for daily averages is misleading and emphasizes the critical importance of understanding peak Transactions Per Second (TPS), workload shape, and burst duration to build resilient and scalable systems. The author uses a real-world financial reconciliation service example to illustrate how actual peak loads can be orders of magnitude higher than naive averages, fundamentally altering architectural decisions.
Read original on Dev.to #systemdesignSystem design often begins with understanding scale. A common pitfall is to quote total daily transaction volume (e.g., 100K transactions/day) as the primary metric for load. While seemingly impressive, this aggregate number offers little insight into the actual demands on a system. Distributing 100K transactions evenly over 24 hours yields approximately 1 TPS, a load easily handled by a single-core instance. This average obscures the true challenges of real-world workloads, which are characterized by spikes and concentrated bursts.
To effectively design for resilience and performance, engineers must look beyond daily averages and consider the 'shape of the workload.' Three crucial dimensions define this shape:
Case Study: Financial Reconciliation Service
An end-of-day reconciliation service processed 100K transactions daily. The naive average suggested ~1 TPS. However, the actual workload involved a concentrated 45-minute window at end-of-day, where the system experienced a peak load of 22 TPS. This 22x difference fundamentally changed the design requirements, illustrating how averages can be deceptive.
Designing for peak load rather than average load impacts almost every architectural decision:
The Crucial Interview Question
In system design interviews, always ask: "What does the peak traffic look like, and over what specific window does it occur?" Also, inquire about business events that concentrate load and the latency/rate limits of downstream dependencies. This demonstrates practical, operations-minded thinking.