This article details how Airbnb evolved its offline data warehouse architecture to support the introduction of new product lines (Experiences, Services) alongside its traditional Homes business. It explores the critical trade-off between separate and monolithic data models, outlining a hybrid framework guided by foundational principles and modeling guidelines to balance consistency with flexibility. The solution addresses challenges in data standardization and managing technical debt within a rapidly expanding data ecosystem.
Read original on Airbnb EngineeringAirbnb faced a significant challenge in evolving its decade-old offline data warehouse to accommodate new product lines like Experiences and Services, beyond its core Homes offering. The primary concern was how to expand data architecture without introducing chaos into vital analytics services, risking data silos, inconsistent analytics, and increased technical debt. This scenario highlights a common problem in rapidly growing organizations: how to maintain data integrity and analytical capability while diversifying product offerings.
The central architectural decision revolved around structuring offline data for a multi-product world. Two extreme approaches were considered:
Balancing Act
Neither separate nor monolithic models are universally superior. The optimal choice depends heavily on the specific business domain. A balanced approach often involves a framework that combines centralized principles with decentralized modeling guidelines, allowing teams flexibility within a consistent structure.
Airbnb adopted a hybrid framework by establishing three foundational principles for consistency and scalability, coupled with guidelines to empower teams to make domain-specific modeling decisions:
Modeling guidelines helped teams choose between separate and monolithic approaches based on factors like shared vs. unique product attributes, future evolution, upstream alignment, downstream consumers, code maintainability, data volume & performance, and compatibility.
The most decisive factor was whether product lines shared mostly common data attributes or had significant unique attributes.
| Model Type | Use Cases | Examples from Airbnb |
|---|
Separate Data Models
Used for product-specific logic where attributes are highly distinct: * Listings: New "Service offerings" with many-to-one relationships to a parent service, unlike Homes/Experiences. * Availability: Flexible "business hours" for Services, distinct from calendar dates (Homes) or specific start times (Experiences). * Location: Radius-based "Service areas" for hosts who travel, different from fixed locations. * Guests: Distinct user journeys and high-volume interaction data for each product require separate models for performance and accuracy.
Monolithic Data Models
Used for cross-cutting concepts that span multiple products: * Messaging: A single message thread can involve multiple product types, requiring a unified view. * Payments: Transactional data is largely product-agnostic, handling refunds and payouts uniformly. * Customer Support: Support requests often involve multiple products, necessitating a holistic view of user history.