Agoda developed Storefront, a Rust-based reverse proxy built on Cloudflare's Pingora, to address uneven load distribution and improve reliability for S3-compatible object storage. It replaces DNS round-robin with real-time, latency-aware load balancing and incorporates operational enhancements like credential-less authentication and improved IO timeout handling. This solution significantly optimizes large-scale data processing and analytics workloads by providing better control over traffic routing and access management.
Read original on InfoQ ArchitectureAgoda relies heavily on object storage (specifically S3-compatible endpoints from VAST Data) for its data processing and analytics workloads. A critical issue identified was the backend object storage's use of DNS round-robin for load distribution. While seemingly simple, application clients often cache DNS responses for extended periods, leading to persistent connections to the same backend nodes. This behavior results in uneven load distribution, creating "hotspots" where certain nodes are overloaded while others remain underutilized, impacting performance and efficiency of data pipelines.
Why DNS Round-Robin Can Fail
DNS round-robin is a basic load-balancing technique where a DNS server returns different IP addresses in a round-robin fashion for subsequent queries. However, client-side DNS caching can defeat this purpose, causing clients to stick to a single IP for a period, leading to an unbalanced load on backend servers. For high-scale, dynamic environments, more sophisticated load balancing is typically required.
To overcome the limitations of DNS-based distribution, Agoda engineered Storefront, a reverse proxy written in Rust and leveraging Cloudflare's open-source Pingora framework. Storefront's core innovation is its active, real-time load distribution logic. Instead of static DNS resolution, it evaluates backend availability and current request load to intelligently route S3 requests. Initial implementations used a least-in-flight requests algorithm, which was further refined with latency-aware scoring to ensure optimal distribution under varying production conditions.
Storefront effectively transforms a basic S3 endpoint into a robust, high-performance object storage gateway, offering Agoda granular control over traffic, improved security posture, and comprehensive visibility, critical for managing large-scale data infrastructure.