Airbnb Engineering·May 19, 2026

Scaling Airbnb's Identity Graph with an Internal Knowledge Graph Platform

This article details Airbnb's journey to build a unified, internally managed knowledge graph infrastructure to scale its critical identity graph. It covers the architectural evolution from relational and third-party solutions, the challenges faced, the technical stack chosen (JanusGraph + DynamoDB), and key optimizations for performance and stability. The migration highlights the trade-offs of build vs. buy, emphasizing control over performance and operational overhead for large-scale graph workloads.

Databases & Storage Distributed Systems Performance & Scaling

Read original on Airbnb Engineering

The Challenge: Scaling a Critical Identity Graph

Airbnb's identity graph is a foundational component for Trust and Safety, mapping relationships between users to detect suspicious activities and identify linked accounts. It grew to 7 billion nodes and 11 billion edges, with 5 million new edges daily. This scale presented significant challenges: ensuring scalability for writes and complex, multi-hop queries, mitigating long-tail latency from high-fanout nodes, and maintaining system stability under heavy load.

Evolution of Graph Architecture

Relational Database + KV Store: Initial approach with relational databases for entities and a KV store for JSON-encoded edge lists. This became unscalable and expensive as graph density increased.
Third-party SaaS Graph Database: Adopted in 2021, offering improved horizontal scalability but introducing long-tail latency, operational instability, and limited control over performance tuning and access controls.
Internally Managed Graph Infrastructure: The current iteration, a multi-tenant platform built to support low-latency, large-scale graph workloads, addressing the shortcomings of previous solutions.

Building the Unified Knowledge Graph Infrastructure

To overcome the limitations of fragmented graph solutions (relational 'graphs', offline graphs, DIY open-source, managed PaaS), Airbnb developed a paved-path, multi-tenant internal platform. The core technology stack chosen was JanusGraph (a distributed, open-source graph database) with DynamoDB as the storage backend and OpenSearch for indexing. This combination offered storage separation, allowing Airbnb to leverage DynamoDB's scalability and reliability while maintaining control over the graph logic layer.

💡

Why JanusGraph with DynamoDB?

JanusGraph's pluggable storage backend was crucial. It allowed Airbnb to decouple the graph processing logic from the underlying persistent storage. This enables rapid iteration on graph features without reimplementing distributed storage operations, and provides flexibility to evolve the storage layer independently.

Key Optimizations and Migration Gains

Optimized Transactions: Custom transaction strategy leveraging DynamoDB's conditional writes to reduce JanusGraph's default heavy locking overhead.
Parallel Query Execution: Improved `getMultiSlices` interface for parallel data fetching, crucial for high-fanout queries.
Observability: Integrated Airbnb's distributed tracing into a custom JanusGraph fork to enhance monitoring and debugging.
Client-Side Query Optimization: Rewriting Gremlin queries (e.g., removing `Path` steps, optimizing side-effect steps) to align with JanusGraph's query planner and avoid non-batched operations.

The migration yielded significant improvements: superior query performance (especially P99 latency reduction), enhanced system stability (no more manual reboots, faster incident response), and robust scalability (10x write QPS compared to the previous vendor solution).

graph databaseknowledge graphJanusGraphDynamoDBscalabilityperformance optimizationsystem migrationdistributed systems

Comments

Loading comments...

Architecture Design

Design this yourself

Design a highly scalable and fault-tolerant identity graph system capable of processing billions of nodes and edges, handling 5 million new edges daily, and supporting complex multi-hop queries with low latency. Your design should incorporate principles of a unified knowledge graph infrastructure, leveraging a distributed graph database with pluggable storage (e.g., JanusGraph with DynamoDB), and include strategies for query optimization, transactional integrity, and enhanced observability. Focus on the architecture's evolution, key challenges, and trade-offs made for performance and stability.

Practice Interview

Other design angles

· Design a real-time fraud detection system that utilizes a graph database to identify suspicious patterns and linked accounts. Detail the ingestion pipeline, query patterns, and necessary optimizations for fast anomaly detection.· Architect a generic, multi-tenant knowledge graph platform suitable for various enterprise use cases (e.g., identity, inventory, data lineage). Focus on tenant isolation, schema management, and API design for different query patterns.· Propose an architecture for migrating an existing graph workload from a third-party SaaS solution to an in-house managed open-source stack. Outline the benchmarking, shadow traffic, and client-side optimization strategies required for a smooth transition.