Dev.to #systemdesign·March 31, 2026

Scaling WebSockets: From 100k to 1M Users and Tackling Backpressure

This article provides a post-mortem on the challenges faced when scaling a WebSocket-based live commentary platform from 100,000 to 1 million concurrent users. It details how an initially simple fan-out architecture led to Out-Of-Memory (OOM) kills due to slow consumers and backpressure, and outlines the architectural changes implemented to achieve resilience and scalability, including ruthless message dropping and coalescing.

Distributed Systems Performance & Scaling Case Studies & Postmortems

Read original on Dev.to #systemdesign

The article recounts a real-world incident where a WebSocket-based live commentary platform, designed with a standard fan-out pattern (Redis Pub/Sub to Go services broadcasting to clients), encountered severe memory issues and node crashes when scaling from 100,000 to 1 million users. The core problem was identified as backpressure failure caused by slow consumers rather than a general traffic spike or CPU bottleneck, leading to gigabytes of stale data accumulating in memory.

The Initial Architecture: A Deceptive Fan-Out

The initial design was straightforward: a writer service published score updates to Redis Pub/Sub, and a fleet of Go services subscribed to Redis, then fanned out these updates to connected WebSockets. Each client was given a buffered channel (256 messages) to absorb transient network lags. This design, while simple and effective at lower scales, became a "RAM hoarder" as the user base grew.

type Client struct {
    hub *Hub
    conn *websocket.Conn
    send chan []byte // Buffered channel: 256 slots
}

func (c *Client) writePump() {
    for {
        select {
        case message, ok := <-c.send:
            if !ok { c.conn.WriteMessage(websocket.CloseMessage, []byte{}); return }
            // This Write() call to the WebSocket connection can block if the client is slow.
            // Meanwhile, the 'send' channel continues to fill up.
            w, err := c.conn.NextWriter(websocket.TextMessage)
            if err != nil { return }
            w.Write(message)
        }
    }
}

The Slow Consumer Problem

⚠️

The Network is Not a Uniform Pipe

The critical misconception was treating the internet as a uniform, fast pipe. Load tests with high-bandwidth clients masked the real-world impact of users on slow or unstable networks (e.g., 3G in crowded stadiums). When a client's TCP send buffer filled due to slow ACKs, the application-level `Write` call would block. With buffered Go channels, this meant messages would queue up in memory, consuming significant RAM for users who were already far behind real-time data.

The math was stark: 200,000 slow consumers, each with a 256-message buffer of ~1KB payloads, amounted to over 51GB of queued messages across the cluster, leading to OOM kills on 16GB nodes. The system was trying to save every packet for every user, ultimately crashing the platform for everyone.

Solutions: Discipline, Brutality, and Jitter

Ruthless Send (Drop-If-Full): The `send` logic was changed from blocking to non-blocking. If a client's buffer was full, the message was dropped, and crucially, the client connection was closed. This was based on the principle that real-time data has an expiration date; a client significantly behind is consuming resources for stale, valueless data. This capped memory usage per client.
Message Coalescing: For clients who were only slightly slow, a coalescing buffer was introduced. Instead of queuing every individual update, the client struct held a pointer to the *latest* state. The write pump would then send only one aggregated message representing the most recent state, decoupling the server's update rate from the client's consumption rate.
Random Jitter for Reconnects: Closing connections for many slow users simultaneously led to a "thundering herd" problem upon reconnect. Thousands of clients attempting to reconnect at the same time overwhelmed load balancers and authentication services. This was mitigated by adding random jitter (0-5 seconds) to client-side reconnect logic, spreading the load over time.

These changes resulted in handling 1.2 million concurrent users with stable memory usage and improved latency, demonstrating that effective WebSocket scaling is fundamentally about managing queues and backpressure, not just opening more sockets.

WebSocketsReal-timeScalingBackpressureGoPost-mortemMemory ManagementConcurrency

Comments

Loading comments...

Architecture Design

Design this yourself

Design a real-time live commentary platform for a major sports event, expecting 1 million concurrent users. Focus on managing backpressure effectively, handling slow consumers, and ensuring system stability under high load. Detail the architecture for ingesting updates, distributing them via WebSockets, and strategies to prevent Out-Of-Memory issues and thundering herd scenarios during client reconnections.

Practice Interview

Other design angles

· Design a generic real-time data fan-out service that can support millions of simultaneous subscribers, with configurable backpressure strategies (e.g., drop, coalesce, throttle).· Design an alert system for a distributed real-time platform that detects and mitigates slow consumer issues before they lead to cascading failures.· Architect a robust client-side WebSocket library that includes adaptive reconnection strategies, message coalescing, and mechanisms to gracefully handle server-side backpressure signals.