InfoQ Architecture·March 23, 2026

Netflix's Graph Abstraction: Scaling 650TB of Graph Data for Real-time Queries

Netflix's Graph Abstraction is a high-throughput system designed to manage massive graph data (650TB) in real time, powering internal services like social graphs and operational monitoring. It achieves millisecond-level queries by layering on existing infrastructure, using caching strategies, and globally replicating data, making trade-offs between query expressiveness and predictable performance.

Distributed Systems Databases & Storage Performance & Scaling

Read original on InfoQ Architecture

Introduction to Netflix's Graph Abstraction

Netflix developed Graph Abstraction to address the challenges of managing and querying extremely large-scale graph data (up to 650TB) with real-time performance requirements. This system serves various internal use cases, from modeling social relationships in Netflix Gaming to providing service topology graphs for incident analysis and operational monitoring. The core challenge addressed is the common trade-off between expressive graph queries and the need for predictable, low-latency performance at high throughput.

Architectural Design and Core Principles

Instead of building a standalone graph database, Graph Abstraction is implemented as a layer on top of Netflix's existing data infrastructure. It leverages a Key-Value abstraction for the latest graph state and a TimeSeries abstraction to store historical changes, enabling temporal queries and auditing. This architectural choice allows Netflix to reuse robust, battle-tested components, minimizing operational overhead and leveraging existing expertise.

Performance Trade-offs: To ensure consistent low latency at scale, the system restricts traversal depth and often requires a defined starting node, prioritizing predictable performance over unbounded query flexibility.
Data Separation: Edge connections are separated from edge properties, optimizing storage and query patterns for different access needs.
Schema Enforcement: Graph schemas are loaded into memory and strictly enforced, which enables validation, optimized traversal planning, and the elimination of invalid query paths, contributing to performance and reliability.

Caching and Global Replication

Effective caching is crucial for achieving millisecond-level query times. Graph Abstraction integrates with EVCache, Netflix’s distributed caching layer. It employs specific caching strategies:

Write-aside caching: Prevents duplicate edge writes, reducing write amplification.
Read-aside caching: Accelerates access to node and edge properties, reducing read amplification.

Global availability and low latency are achieved through asynchronous replication of graph data across multiple regions. Both the caching layers and durable storage replicate data, balancing latency, availability, and eventual consistency trade-offs. This distributed design allows for single-digit millisecond latency for single-hop traversals and under 50 milliseconds for two-hop queries at the 90th percentile.

API and Future Implications

The platform exposes a gRPC traversal API inspired by Gremlin, allowing services to chain traversal steps, apply filters, and limit results. This standardized API simplifies integration for various internal services. As Netflix expands into new verticals like live content, gaming, and advertising, this Graph Abstraction is critical for modeling complex relationships between users, services, and content at scale.

Graph DatabaseNetflixScalabilityReal-timeCachingDistributed GraphMicroservicesData Infrastructure