Event-Driven Architecture
Designing systems around events: event producers, consumers, event buses, event schemas, and avoiding common pitfalls.
What Is Event-Driven Architecture?
Event-Driven Architecture (EDA) is a design paradigm where services communicate by producing and consuming events — immutable records of things that have happened. Instead of Service A calling Service B directly, A emits an `OrderPlaced` event to a bus; any number of services that care about orders subscribe and react. The producer has no knowledge of its consumers.
EDA is the architectural pattern that message queues and pub/sub systems enable. Understanding EDA means understanding not just the mechanics of messaging but the design philosophy behind it.
Events vs Commands vs Queries
| Type | Meaning | Direction | Example |
|---|---|---|---|
| Event | Something happened (past tense, fact) | Broadcast — any consumer | OrderPlaced, PaymentFailed, UserRegistered |
| Command | Do this specific thing (imperative) | Directed — specific target | SendEmail, ProcessPayment, ResizeImage |
| Query | Give me this information | Request/reply | GetOrderStatus, FetchUserProfile |
Name events in past tense
Events represent facts that already occurred. Always name them in past tense: `OrderPlaced` not `PlaceOrder`, `PaymentFailed` not `FailPayment`. This signals to consumers that they are reacting to history, not being commanded to act.
Event Schema Design
A well-designed event schema is critical. Events are effectively part of your public API — once consumers build on them, changes are breaking. Follow these principles:
- Include an event ID — Unique identifier for deduplication and idempotency
- Include a timestamp — When did this happen? Use ISO 8601 / epoch milliseconds
- Include a source — Which service produced this event?
- Include a schema version — For forward compatibility (`"version": 2`)
- Keep payloads self-contained — Include enough context so consumers don't need to call back
- Avoid embedding internal DB IDs only — Include enough business context
// Well-designed event schema
{
"eventId": "evt_01HX9KMVB3FGQZ",
"eventType": "order.placed",
"version": "1.0",
"timestamp": "2025-11-15T14:32:00Z",
"source": "order-service",
"data": {
"orderId": "ord_7821",
"userId": "usr_4421",
"userEmail": "alice@example.com",
"items": [
{ "productId": "prod_99", "name": "Widget", "quantity": 2, "price": 29.99 }
],
"total": 59.98,
"currency": "USD"
}
}Event Bus vs Message Broker
An event bus is the logical concept — a shared channel for events. A message broker (Kafka, SNS, EventBridge) is the implementation. AWS EventBridge is designed specifically as an event bus: it supports schema registry, content-based routing (event patterns), and native integrations with 200+ AWS services and SaaS applications.
Choreography vs Orchestration
Two patterns for coordinating multi-step workflows in EDA:
| Aspect | Choreography | Orchestration |
|---|---|---|
| Control | Distributed — each service knows what to do when it hears an event | Centralized — an orchestrator tells services what to do |
| Coupling | Low — services only know about events, not each other | Higher — orchestrator knows all participants |
| Visibility | Hard to see the full workflow without tracing | Easy to see — orchestrator has full view |
| Failure handling | Each service handles its own failures | Orchestrator can retry, compensate, roll back |
| Best for | Simple, stable workflows with few participants | Complex workflows with sagas and compensations |
| Examples | EDA with Kafka/SNS | AWS Step Functions, Temporal, Camunda |
Common Pitfalls in Event-Driven Architecture
Pitfall: Event schema coupling
If your consumers depend on specific fields in the event payload, any change to the schema breaks them. Use schema versioning and consider an event schema registry (Confluent Schema Registry for Kafka, AWS Glue Schema Registry) to enforce compatibility checks before publishing.
Pitfall: Event ordering assumptions
In distributed systems, events can arrive out of order. `PaymentFailed` may arrive before `OrderPlaced` if different partitions are used. Design consumers to be tolerant of out-of-order delivery, or use partition keys to guarantee ordering within an entity.
Pitfall: Invisible workflows
In choreography, the full business flow is implicit — spread across dozens of event handlers. When debugging a failed order, you may need to trace through 10 services. Invest early in distributed tracing (Jaeger, AWS X-Ray) and correlation IDs in every event.
The Outbox Pattern
The classic dual-write problem: how do you atomically save to your database AND publish an event? If the database save succeeds but the publish fails (or vice versa), your system is inconsistent. The Outbox Pattern solves this: write the event to an `outbox` table in the same database transaction as the business data. A separate relay process reads the outbox and publishes to the message bus, then marks events as sent.
Interview Tip
The Outbox Pattern is an advanced topic that immediately impresses interviewers. When they ask 'how do you ensure your database and message queue stay in sync?', describe the outbox: 'I write the event to an outbox table in the same transaction as the business data. A relay service publishes it to Kafka/SNS asynchronously. This is exactly-once from the database perspective, at-least-once to the message bus — consumers must be idempotent.'