Saga Pattern: Choreography & Orchestration
Manage distributed transactions without 2PC: choreography-based sagas with events vs orchestration-based sagas with a coordinator. Compensation logic and failure handling.
Why Distributed Transactions Are Hard
In a monolith with a single database, transactions are easy — wrap operations in `BEGIN/COMMIT` and the database handles atomicity. In microservices with Database per Service, this breaks down. Each service has its own database, so no single transaction can span all of them. Two-Phase Commit (2PC) is the classic solution, but it is slow, fragile, and requires all participants to be available — unacceptable in a distributed system.
The Saga pattern is the microservices alternative. A saga is a sequence of local transactions, each in its own service, coordinated such that if one fails, all preceding transactions are reversed via compensating transactions. Instead of distributed atomicity, you achieve eventual consistency with explicit rollback logic.
Choreography-Based Saga
In choreography, there is no central coordinator. Each service listens for events and reacts by performing its local transaction and publishing the next event. Services are fully decoupled — they only know about events, not about each other.
Choreography Failure & Compensation
When a step fails (e.g., payment is declined), the failing service publishes a failure event (`PaymentFailed`). Each preceding service listens for failure events and runs its compensating transaction: Order Service cancels the order. Since inventory was not yet reserved, no compensation is needed there. This works well for short, simple sagas but becomes hard to reason about as complexity grows.
Orchestration-Based Saga
In orchestration, a central Saga Orchestrator (sometimes called a process manager or saga manager) explicitly tells each service what to do. The orchestrator maintains state about which step the saga is on and issues commands to services via their APIs or command messages. Services respond with success or failure, and the orchestrator decides what to do next — including triggering compensations.
Choreography vs Orchestration Comparison
| Aspect | Choreography | Orchestration |
|---|---|---|
| Coupling | Services loosely coupled via events | Services coupled to orchestrator's interface |
| Visibility | Hard to see the overall flow (scattered across services) | Easy to see and trace the saga state centrally |
| Complexity | Simple for short sagas, chaotic for long ones | More code upfront, but scales to complex flows |
| Failure handling | Each service must know about compensations | Orchestrator centralizes compensation logic |
| Testing | Harder — must simulate event chains | Easier — test orchestrator state machine directly |
| Tooling | Kafka, EventBridge | Temporal, AWS Step Functions, Axon |
Compensating Transactions
Compensations Are Not Rollbacks
A database rollback undoes changes at the storage level — it is as if the operation never happened. A compensating transaction is a new business operation that logically undoes a previous one. For example, a payment cannot be 'rolled back' — the charge happened. The compensation is a refund. Compensating transactions must be explicitly designed into your domain and may have their own side effects (e.g., a refund notification email).
Idempotency Is Required
Saga steps may be retried on failure. Every service participating in a saga must be idempotent — processing the same command twice must produce the same result as processing it once. The standard approach is to use a unique idempotency key per saga step and store it with the operation result.
Real-World Usage
Uber uses orchestrated sagas (built on Cadence, later open-sourced as Temporal) to coordinate trip fulfillment across Driver, Pricing, Mapping, and Payment services. Netflix uses choreography-based sagas for content publishing workflows. Amazon uses saga-like patterns for order fulfillment spanning warehousing, logistics, and payment microservices. AWS Step Functions provides a managed saga orchestrator as a service.
Interview Tip
Sagas come up in almost every microservices interview. Start by explaining *why* 2PC doesn't work in distributed systems (blocking, participant availability). Then present both approaches: choreography for simple, low-step flows; orchestration for complex flows where visibility matters. Always mention compensating transactions and idempotency — these show you understand the operational realities, not just the happy path.
Practice this pattern
Design an order processing system with saga-based distributed transactions