Menu
Course/Real-World Case Studies/Design a Ride-Sharing System (Uber)

Design a Ride-Sharing System (Uber)

Real-time location tracking, matching algorithm, ETA calculation, surge pricing, trip management, and payment processing at scale.

25 min readHigh interview weight

Problem Statement

A ride-sharing platform connects riders with nearby drivers in real time. The system must track millions of moving driver locations, match ride requests to the closest available driver within seconds, compute ETAs, handle payments, and do all of this globally. The unique challenge is the geospatial real-time matching problem at massive scale.

Requirements

FunctionalNon-Functional
Riders request a ride with pickup/dropoff locationMatch driver within 5 seconds
Drivers broadcast their location every 4 secondsSupport 1 M concurrent drivers
System matches rider to nearest available driver99.99% availability
Show real-time driver location on rider's mapLocation update latency < 1 second
Surge pricing based on supply/demand ratioHandle 1 M ride requests/hour
Payment processing at trip endGlobally distributed (multi-region)

Capacity Estimation

MetricEstimate
Active drivers globally1 M
Location updates/sec (1M drivers × 1/4 sec)250,000 writes/sec
Ride requests/sec~280/sec (1M/hour)
DB writes for location updates/day~21 B
Location payload size~100 bytes (lat, lon, driverId, timestamp)

High-Level Architecture

Loading diagram...
Ride-sharing system high-level architecture

Geospatial Indexing

The core data structure challenge is: given a rider's location, find the K nearest available drivers within R km in under 100 ms. Two common approaches:

ApproachHow It WorksProsCons
Redis GEOUses a geohash-encoded sorted set; GEORADIUS commandSimple, built-in, fastSingle node limit; no custom filtering
Google S2 GeometryHierarchical cell decomposition of the sphereFlexible, hierarchical, supports complex queriesRequires custom indexing layer
H3 (Uber's library)Hexagonal hierarchical gridUniform area cells, great for density analysisLess common; more operational complexity

Uber uses S2 cells internally. For an interview, Redis `GEORADIUS` is sufficient to explain: drivers are stored as `GEOADD drivers lon lat driverId` and queried with `GEORADIUSBYMEMBER riders riderId 5 km ASC COUNT 10`. Each Redis GEO operation runs in O(N+log M) where N is results and M is total entries.

Driver Matching Flow

Loading diagram...
Ride matching flow from request to driver acceptance

Surge Pricing

Surge pricing increases fares when demand outstrips supply in a geographic area. The system computes a surge multiplier per geo cell (e.g., S2 level-12 cell ≈ 3.7 km²) by comparing active ride requests to available drivers in that cell. Kafka streams ride request and driver availability events; a Flink job aggregates per-cell ratios every 30 seconds and writes surge factors to Redis. The Ride Service reads the surge factor at request time.

Location Update Pipeline

Drivers send location updates every 4 seconds via WebSocket. At 1 M drivers, that's 250K writes/second to the geo index. Redis handles this at scale (it can sustain 1M+ ops/sec on a single node), but you'll want Redis Cluster for redundancy. Location history for trip replay and analytics is persisted async to Cassandra (time-series: `(tripId, timestamp) → lat, lon`).

⚠️

Location Data Privacy

Raw driver and rider location data is sensitive. Store it encrypted, retain only what's needed (trip duration + grace period), and implement access controls. In an interview, mentioning privacy and data retention shows engineering maturity.

Scaling Considerations

  • Shard by city: partition all services and data stores by city/region. Drivers and riders in New York don't interact with those in London — this is a natural sharding key.
  • WebSocket server pool: use sticky sessions (consistent hashing by driverId) so location updates always hit the same server; that server holds the WebSocket and updates Redis.
  • Trip DB: use MySQL sharded by `tripId` for ACID guarantees on payment and trip state transitions.
  • ETA service: integrate with Google Maps API or maintain an internal routing engine (OSRM). Cache ETA computations per (start_cell, end_cell) pair in Redis with a 5-minute TTL.
  • Multi-region: deploy per continent; use global load balancing. Ride creation is local; payment systems are global with eventual consistency.
💡

Interview Tip

The geospatial indexing question is what separates good answers from great ones. Know that Redis GEOADD/GEORADIUS is backed by geohashes in a sorted set. Mention the 15-second timeout for driver acceptance and that you fall back to the next candidate — this shows you've thought through the UX and failure modes of matching.

📝

Knowledge Check

5 questions

Test your understanding of this lesson. Score 70% or higher to complete.

Ask about this lesson

Ask anything about Design a Ride-Sharing System (Uber)