Menu
InfoQ Architecture·March 27, 2026

Agoda's Storefront: A Latency-Aware Reverse Proxy for S3 Object Storage

Agoda developed Storefront, a Rust-based reverse proxy built on Cloudflare's Pingora, to address uneven load distribution and improve reliability for S3-compatible object storage. It replaces DNS round-robin with real-time, latency-aware load balancing and incorporates operational enhancements like credential-less authentication and improved IO timeout handling. This solution significantly optimizes large-scale data processing and analytics workloads by providing better control over traffic routing and access management.

Read original on InfoQ Architecture

The Challenge with DNS-Based Load Distribution

Agoda relies heavily on object storage (specifically S3-compatible endpoints from VAST Data) for its data processing and analytics workloads. A critical issue identified was the backend object storage's use of DNS round-robin for load distribution. While seemingly simple, application clients often cache DNS responses for extended periods, leading to persistent connections to the same backend nodes. This behavior results in uneven load distribution, creating "hotspots" where certain nodes are overloaded while others remain underutilized, impacting performance and efficiency of data pipelines.

ℹ️

Why DNS Round-Robin Can Fail

DNS round-robin is a basic load-balancing technique where a DNS server returns different IP addresses in a round-robin fashion for subsequent queries. However, client-side DNS caching can defeat this purpose, causing clients to stick to a single IP for a period, leading to an unbalanced load on backend servers. For high-scale, dynamic environments, more sophisticated load balancing is typically required.

Storefront: A Latency-Aware Reverse Proxy Solution

To overcome the limitations of DNS-based distribution, Agoda engineered Storefront, a reverse proxy written in Rust and leveraging Cloudflare's open-source Pingora framework. Storefront's core innovation is its active, real-time load distribution logic. Instead of static DNS resolution, it evaluates backend availability and current request load to intelligently route S3 requests. Initial implementations used a least-in-flight requests algorithm, which was further refined with latency-aware scoring to ensure optimal distribution under varying production conditions.

Key Architectural & Operational Improvements

  • Improved Reliability: Implemented I/O timeouts to prevent client-side issues (e.g., failing to consume HTTP responses) from exhausting backend connection pools.
  • Cross-Data-Center Traffic Isolation: Separates traffic destined for different data centers into dedicated backend pools, preventing noisy neighbor issues and improving predictability.
  • HTTP Expect: 100-continue Optimization: Optimizes handling of this header for object upload requests, reducing latency.
  • Credential-less Authentication: Integrates with Kubernetes to identify calling pods and apply internal access controls, centralizing permission management and enhancing security by eliminating the need for services to directly handle S3 credentials.
  • Enhanced Observability: Exposes detailed telemetry via OpenTelemetry, providing insights into performance, resource utilization, traffic patterns, and S3 API usage.

Storefront effectively transforms a basic S3 endpoint into a robust, high-performance object storage gateway, offering Agoda granular control over traffic, improved security posture, and comprehensive visibility, critical for managing large-scale data infrastructure.

Reverse ProxyLoad BalancingS3Object StorageRustPingoraDNSLatency

Comments

Loading comments...