Course/Real-World Case Studies/Design a News Feed (Twitter)

Design a News Feed (Twitter)

Fan-out on write vs fan-out on read, ranking algorithms, timeline generation, celebrity problem, and real-time updates for millions of users.

25 min readHigh interview weight

Problem Statement

A news feed shows a user a personalized, ranked stream of posts from accounts they follow. Twitter/X, Instagram, and LinkedIn all face this challenge. The core difficulty is that a single post by a celebrity (100 M followers) must fan out efficiently, while a user's feed load must be fast (< 200 ms) and relatively fresh.

Requirements

Users can publish posts (text, images, links).
Users see a ranked feed of posts from accounts they follow.
Feed is paginated; infinite scroll with cursor-based pagination.
Near-real-time: new posts appear within a few seconds.
Support 200 M DAU, 5 M posts/day, 200 M feed reads/day.
Likes, retweets, and comment counts are shown on feed cards.

The Core Trade-Off: Fan-Out Strategy

When Alice posts, her 1,000 followers need to see it. There are two fundamental approaches, and the choice dominates the entire architecture:

Strategy	How It Works	Write Cost	Read Cost	Best For
Fan-out on Write (Push)	At post time, add post ID to every follower's feed cache	O(followers) per post	O(1) — pre-built feed	Normal users (< 10K followers)
Fan-out on Read (Pull)	At feed load time, merge following users' posts	O(1) per post	O(following) per read	Celebrity accounts (> 1M followers)
Hybrid	Push for normal users; pull for celebrities at read time and merge	O(normal followers)	O(celebrities following)	Production systems (Twitter, Instagram)

⚠️

The Celebrity Problem

Pure fan-out on write breaks for accounts with millions of followers. Writing 100 M feed entries for a single tweet takes minutes and blows up your message queue. The hybrid approach skips pushing to celebrity posts — instead, at feed-load time, you merge the pre-built feed cache with a pull of the top celebrities the user follows.

High-Level Architecture

Loading diagram...

Hybrid fan-out news feed architecture

Feed Storage: Redis Sorted Sets

Each user's feed is stored as a Redis sorted set keyed by `feed:{userId}`, where the score is the post's timestamp or ranking score. The value is the `postId`. On feed load, you do a `ZREVRANGE feed:{userId} 0 19` to get the top 20 post IDs, then batch-fetch post details from the post cache.

python

# Fan-out worker: push post to follower feeds
def fanout_post(post_id: str, author_id: str, timestamp: float, followers: list[str]):
    pipe = redis.pipeline()
    for follower_id in followers:
        feed_key = f"feed:{follower_id}"
        pipe.zadd(feed_key, {post_id: timestamp})
        pipe.zremrangebyrank(feed_key, 0, -501)  # keep only top 500 posts
    pipe.execute()

# Feed read: get top 20 post IDs then hydrate
def get_feed(user_id: str, cursor: float = "+inf", limit: int = 20):
    feed_key = f"feed:{user_id}"
    post_ids = redis.zrevrangebyscore(feed_key, cursor, "-inf", start=0, num=limit)
    posts = batch_get_posts(post_ids)  # from post cache / DB
    return posts

Feed Ranking

A purely chronological feed is simple but not engaging. Modern systems rank posts by a signal-weighted score that combines recency, engagement (likes, retweets, comments), author affinity (how often you interact with this person), and content type preferences. In an interview, you don't need to implement ML — just mention that the score in the sorted set can be the output of a ranking model, recomputed periodically for cached feeds.

Post Publishing Flow

Loading diagram...

Post creation and fan-out to follower feed caches

Scaling Considerations

Fan-out throughput: use a pool of fan-out workers consuming from Kafka. A single post with 1M followers triggers 1M Redis writes — partition by `followerId` to parallelize.
Feed cache TTL: evict inactive users' feeds from Redis after 30 days to reclaim memory. Reconstruct on next login.
Post DB sharding: shard `posts` table by `authorId` to distribute writes. Cassandra with `(authorId, postId)` partition key works well.
Media CDN: images and videos stored in S3; served via CDN. Feed only stores references (URLs), not binary data.
Counter denormalization: like/comment counts are pre-computed and cached in Redis counters, updated via stream processing, not queried from the posts table on every feed load.

💡

Interview Tip

The single most important decision in this problem is the fan-out strategy. State the hybrid approach early, explain the celebrity threshold (e.g., > 1M followers), and show how the feed-read path merges pre-built feed cache with on-the-fly celebrity post pulls. Interviewers are testing whether you can reason about write amplification at scale.

Practice this pattern

Design a Twitter-like news feed system

Design a Chat System (WhatsApp)

Design a Video Streaming Platform (Netflix/YouTube)