Menu
Airbnb Engineering·May 19, 2026

Scaling Airbnb's Identity Graph with an Internal Knowledge Graph Platform

This article details Airbnb's journey to build a unified, internally managed knowledge graph infrastructure to scale its critical identity graph. It covers the architectural evolution from relational and third-party solutions, the challenges faced, the technical stack chosen (JanusGraph + DynamoDB), and key optimizations for performance and stability. The migration highlights the trade-offs of build vs. buy, emphasizing control over performance and operational overhead for large-scale graph workloads.

Read original on Airbnb Engineering

The Challenge: Scaling a Critical Identity Graph

Airbnb's identity graph is a foundational component for Trust and Safety, mapping relationships between users to detect suspicious activities and identify linked accounts. It grew to 7 billion nodes and 11 billion edges, with 5 million new edges daily. This scale presented significant challenges: ensuring scalability for writes and complex, multi-hop queries, mitigating long-tail latency from high-fanout nodes, and maintaining system stability under heavy load.

Evolution of Graph Architecture

  1. Relational Database + KV Store: Initial approach with relational databases for entities and a KV store for JSON-encoded edge lists. This became unscalable and expensive as graph density increased.
  2. Third-party SaaS Graph Database: Adopted in 2021, offering improved horizontal scalability but introducing long-tail latency, operational instability, and limited control over performance tuning and access controls.
  3. Internally Managed Graph Infrastructure: The current iteration, a multi-tenant platform built to support low-latency, large-scale graph workloads, addressing the shortcomings of previous solutions.

Building the Unified Knowledge Graph Infrastructure

To overcome the limitations of fragmented graph solutions (relational 'graphs', offline graphs, DIY open-source, managed PaaS), Airbnb developed a paved-path, multi-tenant internal platform. The core technology stack chosen was JanusGraph (a distributed, open-source graph database) with DynamoDB as the storage backend and OpenSearch for indexing. This combination offered storage separation, allowing Airbnb to leverage DynamoDB's scalability and reliability while maintaining control over the graph logic layer.

💡

Why JanusGraph with DynamoDB?

JanusGraph's pluggable storage backend was crucial. It allowed Airbnb to decouple the graph processing logic from the underlying persistent storage. This enables rapid iteration on graph features without reimplementing distributed storage operations, and provides flexibility to evolve the storage layer independently.

Key Optimizations and Migration Gains

  • Optimized Transactions: Custom transaction strategy leveraging DynamoDB's conditional writes to reduce JanusGraph's default heavy locking overhead.
  • Parallel Query Execution: Improved `getMultiSlices` interface for parallel data fetching, crucial for high-fanout queries.
  • Observability: Integrated Airbnb's distributed tracing into a custom JanusGraph fork to enhance monitoring and debugging.
  • Client-Side Query Optimization: Rewriting Gremlin queries (e.g., removing `Path` steps, optimizing side-effect steps) to align with JanusGraph's query planner and avoid non-batched operations.

The migration yielded significant improvements: superior query performance (especially P99 latency reduction), enhanced system stability (no more manual reboots, faster incident response), and robust scalability (10x write QPS compared to the previous vendor solution).

graph databaseknowledge graphJanusGraphDynamoDBscalabilityperformance optimizationsystem migrationdistributed systems

Comments

Loading comments...