This article details Airbnb's architectural approach to building a destination recommendation model that leverages user behavior sequences and contextual signals to inspire users in the early stages of trip planning. It highlights the use of a Transformer-based model adapted from language modeling to handle diverse user states (active vs. dormant) and rich geolocation knowledge through multi-task learning, ultimately driving engagement and bookings.
Read original on Airbnb EngineeringAirbnb developed a destination recommendation framework to assist users in the exploration phase of trip planning. This system aims to reduce decision friction and improve engagement by suggesting relevant travel destinations, even when user intent is ambiguous. The architecture is inspired by language modeling, adapting Transformer models to process sequences of user actions and predict destination intent. Key challenges included integrating diverse signals, balancing active vs. dormant user behaviors, and incorporating complex geolocation data effectively.
The model treats each user action (bookings, views, searches) as a "token" within a sequence, similar to words in a sentence. Transformer models are used to capture both short-term (recent views/searches) and long-term (booking history) user interests. Each action token is represented by a sum of embeddings (city/region/days to today), augmented with contextual information like current time to account for seasonality. This holistic approach allows the model to summarize user preferences and predict future destination intent.
System Design Insight: Leveraging Sequence Models
When designing recommendation systems, consider how sequence models like Transformers can capture temporal dependencies and evolving user preferences. Representing user interactions as sequences of 'tokens' (events, actions) allows for rich contextual understanding, crucial for predicting future intent in dynamic environments like e-commerce or travel.
A critical aspect of the system design is distinguishing between active and dormant users, who exhibit vastly different behaviors. Active users have recent activity and clearer intent, while dormant users may have no recent activity and are in a much earlier, exploratory planning stage. To accommodate this, the training data creation process is designed to generate examples tailored to both user types, using up-to-date booking/view/search data for active users (mimicking late booking) and only historical booking data for dormant users (mimicking early planning).
To leverage Airbnb's rich geolocation hierarchy (e.g., cities within regions), the model incorporates multi-task learning. Multiple prediction heads are added to the final layer of the model, enabling it to predict both region-level and city-level destinations simultaneously. By jointly learning these tasks and enforcing consistency between predictions, the model develops richer and more nuanced geographical representations, allowing for suggestions at various granularity levels.
The deployed model powers features like autosuggest in the search bar and abandoned search email notifications, demonstrating its impact on user engagement and booking conversions by guiding users towards relevant destination possibilities.