Choreography vs Orchestration
Two models for coordinating services: event-based choreography (decentralized) vs command-based orchestration (centralized). Trade-offs and when to use each.
The Service Coordination Problem
In a microservices architecture, business processes often span multiple services. An e-commerce order requires: validating the order, reserving inventory, charging the customer, scheduling fulfillment, and sending a confirmation email. Each step lives in a different service. How do you coordinate them? There are two fundamentally different answers: choreography and orchestration.
Choreography: Decentralized, Event-Driven
In choreography, there is no central coordinator. Each service listens for events and reacts by doing its work and publishing new events. Services are aware of the domain events but not of each other. The overall workflow emerges from the collaboration of independent, event-driven services — like dancers following music without a choreographer giving real-time instructions.
- Loose coupling: Services don't need to know about each other, only about the events.
- High scalability: No single bottleneck; each service scales independently.
- Hard to debug: The overall workflow is implicit — you must trace events across services to understand what happened.
- Distributed state: No single place shows the current status of a multi-service process.
- Event proliferation: Adding a new step requires adding new events and having existing services subscribe to them.
Orchestration: Centralized, Command-Driven
In orchestration, a dedicated orchestrator service (or workflow engine) controls the entire process. It calls each service in sequence (or parallel), waits for responses, handles failures, and decides what step to execute next. The orchestrator holds all the business logic for the process flow. Think of it as a conductor directing an orchestra.
- Explicit workflow: The entire process flow is visible in one place — the orchestrator.
- Easier to monitor: Process status is held centrally; dashboards are straightforward.
- Easier error handling: Compensating transactions and retry logic live in the orchestrator.
- Coupling risk: Services become coupled to the orchestrator's API.
- Single point of failure: The orchestrator itself must be highly available.
- Scalability bottleneck: All coordination traffic flows through one component.
Side-by-Side Comparison
| Dimension | Choreography | Orchestration |
|---|---|---|
| Control model | Decentralized (event-driven) | Centralized (command-driven) |
| Coupling | Services coupled to events only | Services coupled to orchestrator API |
| Process visibility | Implicit — must trace events | Explicit — visible in orchestrator |
| Debugging | Hard — distributed traces required | Easier — single place to look |
| Failure handling | Each service handles its own failures | Orchestrator coordinates compensation |
| Scalability | Each service scales independently | Orchestrator can be a bottleneck |
| Best for | Simple, parallel, loosely coupled flows | Long-running, sequential, complex flows |
Hybrid Approaches
Real systems often mix both. A common pattern: use an orchestrator for the high-level business process (order lifecycle), but within each orchestrator step, internal service communication uses choreography for parallelizable sub-steps. For example, the orchestrator triggers 'prepare shipment', and within the fulfillment domain, warehouse services choreograph via events to assemble the package.
Workflow engines like AWS Step Functions, Temporal, and Conductor implement orchestration with built-in durability, retries, and visualization — removing the need to build orchestrator infrastructure from scratch.
Real World: Uber's Orchestration with Cadence/Temporal
Uber built Cadence (now open-sourced as Temporal) to orchestrate complex workflows like driver onboarding, trip lifecycle management, and payment processing. Each workflow is a durable, stateful orchestration that can run for days or weeks, survive process crashes, and resume exactly where it left off. Uber chose orchestration over choreography because process visibility and compensation logic were critical for their operations teams.
Relation to the Saga Pattern
The Saga pattern (for distributed transactions without 2PC) can be implemented as either choreography-based or orchestration-based: a Choreography Saga publishes events and each service reacts with either a forward action or a compensating action; an Orchestration Saga has a central saga orchestrator that explicitly invokes forward steps and, on failure, invokes compensating transactions in reverse order.
Interview Tip
Choreography vs Orchestration is one of the most common follow-up questions after you propose a multi-service architecture. Structure your answer as: 'I'd use choreography when services are naturally event-driven and I want maximum decoupling — like for notification side-effects. I'd use orchestration when the business process has complex branching, long-running steps, or needs explicit compensation logic — like an order checkout flow where I need to know exactly what state the order is in.' Showing you know when to pick each, rather than always defaulting to one, is what impresses interviewers.