Menu
InfoQ Cloud·April 3, 2026

Replacing Database Sequences at Scale: A Distributed ID Generation System

This article details Coupang's journey to replace legacy database sequences with a highly available, low-latency distributed ID generation system without breaking over 100 existing services. The solution leverages local application caching, server-side caching, and DynamoDB as the source of truth, optimizing for performance and availability over strict global ordering and gap-free IDs. It highlights practical design principles for large-scale migrations, emphasizing simplicity and backward compatibility.

Read original on InfoQ Cloud

Migrating a large-scale platform from a relational database to NoSQL often surfaces hidden dependencies, especially around database-native features like sequences. Coupang faced this challenge when deprecating a legacy database that provided unique, monotonically increasing IDs. With over one hundred teams and ten thousand distinct counters relying on these sequences, a drop-in replacement was essential to avoid massive application rewrites and service disruptions. The core problem was designing a system that could generate unique IDs at scale with high availability and low latency, while maintaining compatibility with existing assumptions without being overly complex.

Key Design Principles and Trade-offs

The team deliberately chose simplicity over sophisticated distributed coordination protocols (like consensus or vector clocks) after validating actual requirements. Many teams did not need strict global ordering or gap-free sequences, simplifying the problem significantly. This led to several guiding principles:

  • Minimize Coordination: Avoid distributed locks and consensus where possible to reduce latency and failure modes.
  • Tolerate Gaps: Accept that unused sequences (e.g., from server crashes) are acceptable, simplifying cache management.
  • Push Caching to the Edges: Implement aggressive caching on both the client and server to minimize network round-trips to the persistent store.
  • Keep Architecture Legible: Prioritize an easily debuggable system over an academically elegant one.
  • Preserve Backward Compatibility: Ensure existing schemas and APIs do not require changes, making migration seamless.

Core Architecture: Layered Caching with DynamoDB

The final architecture comprises three layers: DynamoDB as the persistent source of truth, a server-side caching layer, and thick clients with local in-memory caches. This multi-layered caching strategy ensures that most ID requests are served with zero network calls, drastically improving latency and availability.

  1. DynamoDB as Source of Truth: Each sequence is stored as a single DynamoDB item, with the counter name as the key and the current position as a numeric value. Atomic increments using conditional updates ensure uniqueness and consistency when fetching blocks of IDs.
  2. Bulk Fetching: Instead of fetching one ID at a time, the service fetches blocks of 500-1000 sequences. This reduces DynamoDB operations, costs, and improves latency by pushing more requests to cache. The trade-off is potential gaps if a server holding unused sequences crashes.
  3. Server-Side Caching: The sequence service maintains an in-memory cache of pre-fetched sequence blocks. Each instance holds non-overlapping blocks, ensuring uniqueness across instances. This choice avoids introducing external cache dependencies (like Redis) and their associated network hops and failure points.
  4. Client-Side Caching (Thick Client): Applications integrate a library that maintains a local cache of sequences. The vast majority of requests are served directly from this local cache, incrementing an in-memory counter. When the client cache runs low, it requests a new block from the sequence service in the background, making refills transparent to the application.
💡

Availability over Strict Ordering

The design prioritizes availability and low latency over strict global monotonicity across all consumers or gap-free sequences. While IDs are unique and monotonically increasing within a single service instance, they may not be globally ordered across different instances. This trade-off significantly simplifies the distributed system and caters to the actual needs of most consuming services.

ID GenerationDistributed IDsSequencesDynamoDBCachingSystem MigrationMicroservicesScalability

Comments

Loading comments...