InfoQ Architecture·May 28, 2026

Adaptive Hedged Requests for Reducing Tail Latency in Distributed Systems

This article explores the critical distinction between slow requests (stragglers) and failed requests in distributed systems, particularly in fan-out architectures. It details how stragglers, often overlooked by traditional monitoring, significantly contribute to high p99 latency and how naive retries exacerbate the problem. The core solution presented is an adaptive hedging mechanism that proactively sends backup requests based on real-time latency distributions, while employing a token bucket budget to prevent load amplification during genuine outages.

Distributed Systems Performance & Scaling

Read original on InfoQ Architecture

The Challenge of Tail Latency in Fan-Out Architectures

In microservice architectures, especially those with significant fan-out (where a single user request calls multiple downstream services), individual service health metrics can be misleading. A low straggler rate (e.g., 1% slow requests) in individual services can accumulate dramatically at the system level. For instance, with 100 downstream services, a 1% straggler rate per service means over 63% of top-level requests will be impacted by at least one straggler, leading to severe p99 (99th percentile) latency degradation. This phenomenon is why system-wide tail latency often remains high even when all individual services appear healthy.

Stragglers vs. Failures: A Crucial Distinction

The article emphasizes the difference between a failure (a request that doesn't complete) and a straggler (a request that completes, but slowly, due to issues like GC pauses, hot partitions, or kernel blips). While both impact p99 latency, they require different solutions:

Retries are suitable for failures, as they re-send a request that never finished. However, applying retries to stragglers amplifies load on already struggling backends, making the problem worse.
Hedged requests are the correct approach for stragglers. A backup request is sent while the primary is still in flight, and the first response to arrive is used, with the slower one cancelled. This mechanism 'races around' the slow request rather than waiting for it to fail.

Adaptive Hedging Mechanism using DDSketch

The effectiveness of hedging depends on knowing *when* to hedge. Static thresholds are brittle in production environments where latency distributions constantly shift. The proposed solution uses an adaptive mechanism built around DDSketch for real-time, per-host latency tracking. DDSketch is a streaming quantile sketch that provides O(1) constant-memory quantile estimation with relative-error guarantees, allowing it to accurately identify the p90 latency (or another target percentile) of the current distribution for each backend host.

ℹ️

Key Components of Adaptive Hedging

DDSketch: Estimates real-time latency quantiles (e.g., p90) for each target host. It uses logarithmic bucketing and a tumbling window (e.g., two sketches rotating every 30 seconds) to adapt to changing conditions and age out stale data without requiring manual tuning. Token Bucket Budget: Prevents load amplification during genuine outages. It caps the hedge rate at a configurable percentage of total traffic (e.g., 10%). If all requests are slow, the budget quickly exhausts, stopping hedging and allowing graceful degradation rather than doubling load on an already overwhelmed backend.

tail latencyhedged requestsstragglersDDSketchmicroservicesfan-out architecturep99 latencyload balancing

Comments

Loading comments...

Architecture Design

Design this yourself

Design a high-performance API gateway that aggregates data from multiple microservices (fan-out architecture). Incorporate an adaptive hedged request mechanism to significantly reduce p99 tail latency, distinguishing between stragglers and failures. Detail how the system uses real-time latency quantile estimation (e.g., DDSketch with a tumbling window) to trigger hedges and implements a token bucket strategy to prevent load amplification during backend outages.

Practice Interview

Focus: adaptive hedged request mechanism for tail latency reduction

Other design angles

· Design a data ingestion pipeline that processes events from various sources with strict latency requirements. Focus on how an adaptive hedging strategy could be applied to mitigate stragglers in downstream processing stages.· Design a distributed caching layer that transparently handles straggling cache nodes using hedged requests. Explain the trade-offs in terms of complexity, consistency, and resource utilization compared to traditional retry mechanisms.· Architect a trading platform's order execution service, which interacts with multiple exchanges. Implement a hedging mechanism to ensure rapid order placement despite potential stragglers from individual exchange APIs, prioritizing speed while managing resource costs.