This article details Uber's RAMEN system, a robust push messaging infrastructure designed to deliver real-time notifications to millions of rider and driver devices. It explores the architectural choices and engineering challenges in maintaining persistent connections, ensuring at-least-once delivery, and scaling a stateful system for low-latency, reliable communication in unreliable mobile network environments.
Read original on Dev.to #architectureDelivering instant ride offers and real-time location updates to millions of active users in a ride-hailing application presents significant system design challenges. The core requirements include extremely low latency (under 100ms), guaranteed delivery even on weak mobile networks, and efficient resource utilization to avoid overwhelming backend systems. Traditional polling mechanisms are highly inefficient for this scale, leading to excessive requests, wasted server resources, high latency, and significant battery drain on client devices.
Polling vs. Push - Why Push Wins
Polling requires clients to repeatedly ask the server for updates, leading to a high volume of empty responses and inherent latency. A push-based system, like RAMEN, maintains open connections with clients, allowing the server to proactively send data instantly when available. This drastically reduces latency, conserves battery life, and optimizes server resources by only transmitting data when necessary.
Uber's RAMEN (Real-time Asynchronous Messaging Network) is a custom-built push messaging infrastructure. It employs a three-tier architecture to manage the complexity of decision-making, payload construction, and message delivery:
The choice of transport protocol is critical for real-time systems. Initially, RAMEN utilized Server-Sent Events (SSE) over HTTP/1.1. While simple, SSE is unidirectional (server-to-client only), requiring separate HTTP POST requests for client acknowledgments (ACKs). It also suffered from head-of-line blocking and heavy JSON payloads. Uber transitioned to gRPC bidirectional streams over QUIC/HTTP/3 for significant improvements:
RAMEN servers are stateful, meaning each server holds specific TCP/gRPC sockets for particular users. This poses unique challenges for scalability and high availability. To manage millions of connections across a cluster of hundreds of servers, Uber employs Apache Helix and ZooKeeper for sharding and automatic rebalancing. ZooKeeper stores cluster topology, while Helix detects server failures and redistributes shards (groups of user connections) to healthy servers, allowing clients to seamlessly reconnect.