Netflix's Graph Abstraction is a high-throughput system designed to manage massive graph data (650TB) in real time, powering internal services like social graphs and operational monitoring. It achieves millisecond-level queries by layering on existing infrastructure, using caching strategies, and globally replicating data, making trade-offs between query expressiveness and predictable performance.
Read original on InfoQ ArchitectureNetflix developed Graph Abstraction to address the challenges of managing and querying extremely large-scale graph data (up to 650TB) with real-time performance requirements. This system serves various internal use cases, from modeling social relationships in Netflix Gaming to providing service topology graphs for incident analysis and operational monitoring. The core challenge addressed is the common trade-off between expressive graph queries and the need for predictable, low-latency performance at high throughput.
Instead of building a standalone graph database, Graph Abstraction is implemented as a layer on top of Netflix's existing data infrastructure. It leverages a Key-Value abstraction for the latest graph state and a TimeSeries abstraction to store historical changes, enabling temporal queries and auditing. This architectural choice allows Netflix to reuse robust, battle-tested components, minimizing operational overhead and leveraging existing expertise.
Effective caching is crucial for achieving millisecond-level query times. Graph Abstraction integrates with EVCache, Netflix’s distributed caching layer. It employs specific caching strategies:
Global availability and low latency are achieved through asynchronous replication of graph data across multiple regions. Both the caching layers and durable storage replicate data, balancing latency, availability, and eventual consistency trade-offs. This distributed design allows for single-digit millisecond latency for single-hop traversals and under 50 milliseconds for two-hop queries at the 90th percentile.
The platform exposes a gRPC traversal API inspired by Gremlin, allowing services to chain traversal steps, apply filters, and limit results. This standardized API simplifies integration for various internal services. As Netflix expands into new verticals like live content, gaming, and advertising, this Graph Abstraction is critical for modeling complex relationships between users, services, and content at scale.