Uber developed a high-throughput financial ledger processing system to manage extreme write contention on individual accounts, achieving over 30 updates per second per account. The system utilizes a batching-based execution model to aggregate multiple operations into short time windows, processing them atomically to improve efficiency and maintain strict consistency. This architecture significantly reduces processing times for high-volume financial workloads, balancing latency and throughput requirements.
Read original on InfoQ ArchitectureUber faced a significant challenge with its financial ledger system: handling scenarios where a single account experienced a concentrated burst of updates, often exceeding 30 updates per second. Traditional per-request transaction processing models struggled under such high write contention due to repeated storage interactions, coordination costs, and write amplification. This article details Uber's architectural shift to address these bottlenecks, focusing on maintaining strict consistency and auditability crucial for financial systems.
Uber's financial ledger is built on a double-entry accounting model, which offers strong correctness guarantees and end-to-end traceability but demands strict serialization of updates at the account level. In the previous model, each ledger update triggered an independent processing cycle involving state retrieval, validation, balance computation, and persistence. While robust for typical loads, this approach became a bottleneck for 'hot' accounts undergoing bulk adjustments, reconciliations, or operational corrections, leading to significant overhead.
To overcome these limitations, Uber introduced a batching-based execution model. Instead of individual processing, multiple operations targeting the same account are aggregated into short time windows (approximately 250 milliseconds). These batches are then processed as a single atomic unit, sharing a single ledger read and write cycle. This significantly reduces redundant storage interactions and coordination overhead.
Key Architectural Components
The system's workflow involves three stages: 1) incoming updates are grouped into time-bound batches based on account affinity, 2) batched operations are executed as a single atomic unit against the ledger state, and 3) results are persisted and propagated to downstream systems, including audit logging and reconciliation. Redis is used for coordination in grouping updates, and optimistic atomic update mechanisms ensure correctness under concurrent access.
A crucial design consideration was the trade-off between batching window size and end-to-end latency. Smaller windows reduce latency but increase overhead, while larger windows improve throughput efficiency at the cost of delayed processing. Uber's configuration uses tightly controlled 250ms batching windows to balance these requirements. The design also incorporates failure isolation mechanisms to handle partial failures within a batch, improving stability and reducing retry amplification.