Menu
Airbnb Engineering·March 24, 2026

Architecting Resilient Forecasting Models: Separating Volume from Composition

This article from Airbnb Engineering details how they re-architected their forecasting models to withstand extreme structural shifts, such as those caused by the COVID-19 pandemic. The core architectural decision involved decomposing the forecast into two independent components: gross booking volume and lead-time composition. This separation enabled better diagnosis of model failures and allowed for the development of specialized models, like B-DARMA, to handle compositional time series data, leading to more robust and adaptable predictions.

Read original on Airbnb Engineering

Forecasting models are critical for business operations, but often struggle with sudden, unprecedented structural shifts in data, as experienced by many during the COVID-19 pandemic. Airbnb's experience highlights a common challenge: when underlying relationships within data change fundamentally, traditional models designed for stable patterns fail. This necessitates a re-evaluation of the forecasting system's architecture to build resilience.

The Challenge: Entangled Signals and Structural Shifts

Prior to COVID-19, Airbnb's models directly forecasted trip-date metrics, implicitly learning the relationship between booking date and trip date as an integrated process. This approach worked when this relationship (lead-time composition) was stable. However, the pandemic introduced two simultaneously shifting variables: the overall volume of bookings and the temporal distribution of when those bookings would materialize as trips. This entanglement made it impossible for models to distinguish between changes in volume and structural shifts in booking behavior, leading to inaccurate forecasts and difficulty in diagnosing the root cause of errors.

Architectural Solution: Decomposing the Forecast

ℹ️

Architectural Insight

The key architectural decision was to decouple the forecasting problem into two distinct components: determining "what" is being booked (gross booking volume) and "when" it will be realized (lead-time composition). This separation allows for specialized modeling and better diagnostics.

  • Gross Booking Volume: This component focuses on standard time-series forecasting for metrics like daily bookings. Changes here are often correlated with external events like lockdowns or reopenings, making them more interpretable.
  • Lead-Time Composition: This component models the distribution of how bookings recorded today translate into trips across future time windows. This is a compositional time series, where proportions must sum to one. This was identified as the more volatile and challenging aspect.

By separating these concerns, the team could focus modeling efforts on the component experiencing the most instability (lead-time composition). This decomposition provides a more granular view into model performance and allows for targeted improvements.

Modeling Compositional Data with B-DARMA

Standard time series methods are not well-suited for compositional data, as they don't inherently respect the simplex constraint (proportions must be non-negative and sum to one). Airbnb developed B-DARMA (Bayesian Dirichlet Auto-Regressive Moving Average), a specialized modeling framework:

  • Dirichlet Distribution: Used to enforce valid compositions, ensuring proportions are non-negative and sum to one.
  • Auto-Regressive and Moving Average Components: Capture temporal dynamics in how compositions evolve.
  • Bayesian Framework: Provides full predictive distributions with calibrated uncertainty and allows for incorporating domain knowledge through priors.

This specialized model addresses the unique challenges of forecasting proportional data, providing more robust and interpretable results compared to naive approaches that might yield incoherent or negative proportions.

Beyond the Crisis: Persistent Shifts and Monitoring

A crucial finding was that even after gross booking volumes recovered, the lead-time compositions did not revert to pre-pandemic norms; they settled into a new, shifted distribution. This underscored the need for models to handle persistent structural breaks rather than treating them as temporary anomalies. Monitoring distributional divergence (e.g., using L1 distance) became a vital diagnostic tool, not just for detecting exogenous shocks but also for identifying model misspecification before it impacts top-line error metrics. This allows teams to distinguish between "model is right, inputs are unusual" and "model's assumptions no longer hold."

forecastingmachine learningtime seriesdata architectureresiliencemodelingdata sciencemicroservices

Comments

Loading comments...