Course/Messaging & Communication Patterns/Fan-Out / Fan-In Pattern

Fan-Out / Fan-In Pattern

Distribute work to many workers (fan-out) and aggregate results (fan-in): scatter-gather, MapReduce-style processing, and aggregation strategies.

12 min readHigh interview weight

The Core Idea

Fan-Out distributes a single task to multiple workers that process sub-problems in parallel. Fan-In aggregates the results from those workers back into a single unified result. Together, they form the Scatter-Gather pattern: scatter the work, gather the results. This is the messaging equivalent of parallel processing — the same idea behind MapReduce, parallel database query execution, and search engine result merging.

The pattern is deceptively simple conceptually but has many subtle implementation challenges: how do you know when all workers have finished? What if one worker fails? How do you aggregate partial results efficiently? How do you handle stragglers?

Fan-Out / Fan-In Flow

Loading diagram...

Fan-Out / Fan-In: scatter sub-tasks to parallel workers, gather results in aggregator

Fan-Out Patterns in Practice

Fan-out manifests in two primary forms depending on what you are distributing:

Fan-Out Type	What Is Distributed	Example	Mechanism
Notification fan-out	The same event is sent to many recipients	Twitter/X follower notifications, push notifications to millions of users	SNS → SQS per region/shard, Kafka topic partitions, Redis Pub/Sub
Work fan-out	A large task is split into sub-tasks processed in parallel	Search across multiple index shards, batch image processing, MapReduce	Task queue (SQS/RabbitMQ) with competing consumers, Kafka partitions

Notification Fan-Out at Scale: The Twitter Problem

Imagine a user with 10 million followers posts a tweet. Naively, you push a notification to each follower's feed — that is 10 million writes on a single user action. This is the canonical fan-out problem. Two architectural approaches:

Approach	Description	Pro	Con
Fan-out on write (push model)	Write to every follower's feed at post time via async workers	Reads are instant — pre-computed feed	Expensive for high-follower users (celebrities); write amplification
Fan-out on read (pull model)	Compute the feed at read time by merging followed users' timelines	No write amplification	Read is expensive and slow for high-volume users; need caching
Hybrid	Fan-out on write for regular users, fan-out on read for celebrities	Best of both worlds	More complex; requires follower-count threshold logic

Loading diagram...

Fan-out on write: tweet is asynchronously written to millions of followers' cached feeds via sharded worker queues

Work Fan-Out: Aggregation Challenges

When fans-out distributes work (not just notifications), the fan-in aggregation is the tricky part. The orchestrator needs to know when all sub-tasks are complete before producing the final result.

Common aggregation strategies:

Counter in shared state: orchestrator writes `expectedCount=N` to Redis; each worker atomically decrements; at 0, trigger final assembly. Fast but requires atomic operations.
Correlation tracking table: persist each sub-task in a DB with status; query for completion. More durable but slower.
Saga orchestrator pattern: orchestrator explicitly tracks which sub-tasks have replied and coordinates the fan-in step.
Promise.all / Future aggregation: in code (not messaging), simply await all parallel futures and collect results.

typescript

// Redis-based fan-in coordination
async function fanOutSearch(query: string, shards: string[]): Promise<SearchResult[]> {
  const jobId = crypto.randomUUID();
  const expectedCount = shards.length;

  // Initialize counter
  await redis.set(`job:${jobId}:remaining`, expectedCount, "EX", 60);
  await redis.set(`job:${jobId}:results`, JSON.stringify([]), "EX", 60);

  // Fan-out: dispatch sub-query to each shard worker
  for (const shard of shards) {
    await queue.publish(`search.shard.${shard}`, { jobId, query });
  }

  // Wait for fan-in: poll until counter reaches 0
  return waitForCompletion(jobId, 30_000 /* 30s timeout */);
}

// Each worker calls this after completing its shard search:
async function workerComplete(jobId: string, partialResults: SearchResult[]) {
  const pipeline = redis.pipeline();
  // Append results atomically
  const current = JSON.parse(await redis.get(`job:${jobId}:results`) ?? "[]");
  pipeline.set(`job:${jobId}:results`, JSON.stringify([...current, ...partialResults]), "EX", 60);
  pipeline.decr(`job:${jobId}:remaining`);
  await pipeline.exec();
}

Handling Stragglers and Partial Failures

In any large fan-out, some workers will be slow (stragglers) or fail. Strategies to handle this:

Timeouts with partial results: after a timeout, return the results you have with a `partial: true` flag. Search engines do this — missing one index shard is better than timing out entirely.
Speculative execution (hedging): after a delay threshold, send the same sub-task to a second worker. Use whichever finishes first. Google's 'The Tail At Scale' paper popularized this.
Retry with DLQ: failed sub-tasks go to a Dead Letter Queue for retry; the orchestrator is notified separately.
Idempotent workers: make sub-task processing idempotent so retries are safe.

💡

The 99th Percentile Problem

In a fan-out of 100 sub-tasks, your total latency is bounded by the slowest worker — the 99th percentile latency of a single worker becomes the median latency of the fan-out. Design for this: set per-worker timeouts aggressively, use speculative execution for critical paths, and prefer partial results over indefinite waits.

The canonical AWS fan-out pattern uses SNS as the broadcaster and SQS queues as the per-service durable receivers. One SNS topic connects to multiple SQS queues — each queue is owned by a different downstream service. This gives each consumer its own independent, durable queue while the event is broadcast to all of them simultaneously.

Loading diagram...

AWS SNS + SQS fan-out: each downstream service gets its own durable SQS queue fed by a single SNS topic

💡

Interview Tip

Fan-out/fan-in appears in almost every large-scale design question: notification systems, search, batch processing, and analytics. For notifications, cover push vs pull model and the celebrity/hotspot problem. For work fan-out, explain how you track completion (counter pattern), handle stragglers (timeouts + partial results), and guarantee exactly-once aggregation (idempotent workers). These details separate strong candidates from average ones.

Practice this pattern

Design a notification system for millions of users using fan-out

Sequential Convoy Pattern

Federated Identity Pattern

Fan-Out / Fan-In Pattern

The Core Idea

Fan-Out / Fan-In Flow

Fan-Out Patterns in Practice

Notification Fan-Out at Scale: The Twitter Problem

Work Fan-Out: Aggregation Challenges

Handling Stragglers and Partial Failures

Comments

Knowledge Check

The Core Idea

Fan-Out / Fan-In Flow

Fan-Out Patterns in Practice

Notification Fan-Out at Scale: The Twitter Problem

Work Fan-Out: Aggregation Challenges

Handling Stragglers and Partial Failures

AWS SNS + SQS Fan-Out

Comments

Knowledge Check