Menu
ByteByteGo·March 24, 2026

Netflix Live Origin Architecture for Scalable Live Streaming

This article delves into the architecture of Netflix's Live Origin, a custom-built server designed to manage and deliver live video segments to millions of devices. It highlights key architectural decisions such as redundant regional pipelines, manifest design with segment templates, and intelligent segment selection for defect handling. The piece also explores the evolution of its storage architecture from AWS S3 to a custom Cassandra-based solution, optimized for the unique demands of high-throughput, low-latency live streaming.

Read original on ByteByteGo

Introduction to Netflix Live Origin

Netflix Live Origin is a critical intermediary component between live streaming pipelines and Open Connect, Netflix's global Content Delivery Network (CDN). Unlike Video on Demand (VOD), live streaming demands real-time processing and delivery of video segments, typically within seconds. The Live Origin acts as a quality control gateway, ensuring that only valid video segments reach viewers worldwide. Its design addresses the challenges of time constraints, defect handling, and massive scale inherent in live video distribution.

Core Architectural Decisions for Reliability and Performance

  • Redundant Regional Pipelines: Netflix employs two independent live streaming pipelines operating simultaneously in different cloud regions. Each pipeline includes its own encoder, packager, and video contribution feed. This redundancy significantly reduces the probability of both pipelines producing defective segments at the same time, enhancing reliability.
  • Manifest Design with Segment Templates: Instead of dynamic manifest updates, Netflix uses a predictable template for video segments, each with a fixed duration (e.g., 2 seconds). This design allows the Live Origin to accurately predict segment publication times and optimize retrieval.
  • Multi-Pipeline Awareness and Intelligent Selection: The Live Origin inspects segments from both pipelines and selects the first valid one. Packagers include metadata on defects, allowing the Origin to make informed decisions and pass defect information downstream if both pipelines fail.

Optimizations for Open Connect

Open Connect, originally optimized for VOD, required extensions for live streaming. Netflix optimized nginx's proxy-caching functionality with several key features:

  • Segment Range Validation: OCAs (Open Connect Appliances) can determine and immediately reject requests for segments outside the legitimate range using template information, preventing unnecessary network traffic.
  • Cached 404 Responses: When a segment is not yet available, the Origin returns a 404 with an expiration policy. Open Connect caches this response until just before the segment is expected, avoiding repeated failed requests.
  • Request Holding at Live Edge: For requests for the next segment at the 'live edge,' the Origin holds the request open instead of returning a 404. Once the segment is published, it responds immediately, significantly reducing network traffic for slightly early requests. This required millisecond-grain caching in nginx.
  • Streaming Metadata via HTTP Headers: Custom HTTP headers are used to communicate streaming events (e.g., ad breaks, content warnings) scalably. This ensures clients receive the latest notifications regardless of their playback position.
  • Cache Invalidation and Origin Masking: A sophisticated invalidation system allows flushing content by altering version numbers in cache keys. It supports invalidating specific segment ranges from particular encoders or regions. Origin masking allows operations to selectively exclude problematic segments from a pipeline.

Evolution of Storage Architecture

Netflix initially used AWS S3 for Live Origin storage, but found its performance inadequate for high-scale, low-latency live streaming. The stringent 2-second retry budget and critical, time-sensitive writes demanded a more robust solution. They identified five key requirements: extremely high write availability within a region with low-latency cross-region replication, high write throughput (hundreds of MB/s), efficient handling of large writes with thousands of keys per partition, strong intra-region consistency for sub-second read latency, and gigabytes of read throughput without affecting writes during 'Origin Storms'.

💡

Live Streaming vs. VOD Storage Needs

The article highlights a crucial distinction: live streaming storage requirements are closer to a global, low-latency, highly available database than traditional object storage, primarily due to the criticality of every write and the strict time budgets.

Their solution leveraged an existing Key-Value Storage Abstraction built on Apache Cassandra. By chunking large payloads and using Cassandra's local-quorum consistency with a write-optimized Log-Structured Merge Tree engine, they met the stringent write availability, throughput, and consistency requirements. Median write latency dropped significantly from 113ms to 25ms. To handle 'Origin Storms' (high read throughput impacting writes), they introduced write-through caching using EVCache (their distributed Memcached-based system), offloading most reads to a highly scalable cache and enabling 200Gbps+ throughput without affecting write performance.

NetflixLive StreamingCDNOpen ConnectCassandraMicroservicesHigh AvailabilityFault Tolerance

Comments

Loading comments...