This article details Pinterest's approach to integrating real-time online context into their sequential ad recommender models to enhance relevance. It highlights the architectural changes, a novel training method using synthetic data, and a hybrid online/offline inference serving flow. The solution significantly improved ad relevance and conversion metrics by dynamically incorporating user's immediate intent.
Read original on Pinterest EngineeringTraditional sequential recommender models, while effective in leveraging historical user behavior, often lack the ability to incorporate real-time, online context. This limitation is critical for surfaces where immediate user intent is paramount, such as 'Related Pins' or 'Search' on Pinterest. Without understanding what a user is currently viewing or searching, recommendations can fall short in relevance, leading to poor user experience and lower engagement. The article describes how their initial Transformer-based model, relying solely on offline historical data, struggled to perform on these highly contextual surfaces, necessitating an architectural evolution.
To address the context gap, Pinterest evolved its two-tower model by integrating a context layer directly into the query tower. This architectural change allows the model to concatenate the output of the historical Transformer encoder with real-time context features. The combined representation is then fed into a Multi-Layer Perceptron (MLP) to generate a dynamic user embedding. For 'Related Pins', context features are derived from the currently viewed Pin, enhancing personalization with additional user demographic embeddings.
A critical system design aspect is the hybrid user embedding inference approach. Since context features are only available at request time (online), the system splits the computation:
System Design Takeaway: Hybrid Architectures
This hybrid approach exemplifies a common pattern in large-scale machine learning systems where combining offline batch processing (for stability and efficiency) with online real-time computation (for freshness and responsiveness) can yield significant performance gains and address specific data availability challenges. It allows for leveraging deep historical insights while remaining agile enough to react to immediate user signals.
A notable challenge was enabling the model to learn from real-time context during offline training, as this data isn't available until serving. Pinterest solved this by using synthetic augmented data. Pseudo-context derived from positive conversion events is injected into the input sequence during training, encouraging the model to retrieve items semantically related to the session's context. A high dropout rate in the context layer during training ensures the model doesn't over-rely on synthetic context and still leverages historical sequences.