Course/Data Management Patterns/Saga Pattern: Choreography & Orchestration

Saga Pattern: Choreography & Orchestration

Manage distributed transactions without 2PC: choreography-based sagas with events vs orchestration-based sagas with a coordinator. Compensation logic and failure handling.

20 min readHigh interview weight

Why Distributed Transactions Are Hard

In a monolith with a single database, transactions are easy — wrap operations in `BEGIN/COMMIT` and the database handles atomicity. In microservices with Database per Service, this breaks down. Each service has its own database, so no single transaction can span all of them. Two-Phase Commit (2PC) is the classic solution, but it is slow, fragile, and requires all participants to be available — unacceptable in a distributed system.

The Saga pattern is the microservices alternative. A saga is a sequence of local transactions, each in its own service, coordinated such that if one fails, all preceding transactions are reversed via compensating transactions. Instead of distributed atomicity, you achieve eventual consistency with explicit rollback logic.

Choreography-Based Saga

In choreography, there is no central coordinator. Each service listens for events and reacts by performing its local transaction and publishing the next event. Services are fully decoupled — they only know about events, not about each other.

Loading diagram...

Choreography saga: services react to events without a central coordinator.

Choreography Failure & Compensation

When a step fails (e.g., payment is declined), the failing service publishes a failure event (`PaymentFailed`). Each preceding service listens for failure events and runs its compensating transaction: Order Service cancels the order. Since inventory was not yet reserved, no compensation is needed there. This works well for short, simple sagas but becomes hard to reason about as complexity grows.

Orchestration-Based Saga

In orchestration, a central Saga Orchestrator (sometimes called a process manager or saga manager) explicitly tells each service what to do. The orchestrator maintains state about which step the saga is on and issues commands to services via their APIs or command messages. Services respond with success or failure, and the orchestrator decides what to do next — including triggering compensations.

Loading diagram...

Orchestration saga: a central orchestrator explicitly coordinates each step.

Choreography vs Orchestration Comparison

Aspect	Choreography	Orchestration
Coupling	Services loosely coupled via events	Services coupled to orchestrator's interface
Visibility	Hard to see the overall flow (scattered across services)	Easy to see and trace the saga state centrally
Complexity	Simple for short sagas, chaotic for long ones	More code upfront, but scales to complex flows
Failure handling	Each service must know about compensations	Orchestrator centralizes compensation logic
Testing	Harder — must simulate event chains	Easier — test orchestrator state machine directly
Tooling	Kafka, EventBridge	Temporal, AWS Step Functions, Axon

Compensating Transactions

ℹ️

Compensations Are Not Rollbacks

A database rollback undoes changes at the storage level — it is as if the operation never happened. A compensating transaction is a new business operation that logically undoes a previous one. For example, a payment cannot be 'rolled back' — the charge happened. The compensation is a refund. Compensating transactions must be explicitly designed into your domain and may have their own side effects (e.g., a refund notification email).

Idempotency Is Required

Saga steps may be retried on failure. Every service participating in a saga must be idempotent — processing the same command twice must produce the same result as processing it once. The standard approach is to use a unique idempotency key per saga step and store it with the operation result.

Real-World Usage

Uber uses orchestrated sagas (built on Cadence, later open-sourced as Temporal) to coordinate trip fulfillment across Driver, Pricing, Mapping, and Payment services. Netflix uses choreography-based sagas for content publishing workflows. Amazon uses saga-like patterns for order fulfillment spanning warehousing, logistics, and payment microservices. AWS Step Functions provides a managed saga orchestrator as a service.

💡

Interview Tip

Sagas come up in almost every microservices interview. Start by explaining *why* 2PC doesn't work in distributed systems (blocking, participant availability). Then present both approaches: choreography for simple, low-step flows; orchestration for complex flows where visibility matters. Always mention compensating transactions and idempotency — these show you understand the operational realities, not just the happy path.

Practice this pattern

Design an order processing system with saga-based distributed transactions

Database per Service

Transactional Outbox Pattern