Pinterest implemented an advanced behavioral sequence modeling system to enhance ad candidate generation. This system uses transformer-based models to predict user interactions with advertisers and specific products, significantly improving ad relevance and performance. The architecture involves two-tower models, offline batch processing, and online serving via feature stores and ANN graphs.
Read original on Pinterest EngineeringPinterest's approach to ad candidate generation highlights a common challenge in large-scale advertising systems: delivering relevant ads in a dynamic user environment where interests evolve rapidly. Traditional targeting methods often fall short in capturing the nuances of user behavior. To overcome this, Pinterest leverages behavioral sequence modeling, focusing on a user's historical offsite behavior to predict future conversions. This strategy moves beyond static profiles to dynamic, real-time intent prediction, a critical aspect of modern personalized systems.
The core of Pinterest's solution is a two-tower model architecture. This design pattern is prevalent in recommendation and ad systems, separating user and item (or advertiser) embeddings. The user tower processes event sequences using a bidirectional transformer, capturing temporal dependencies in user behavior. The advertiser/item tower is typically an MLP, operating on static or learned representations. This separation allows for efficient computation of similarity scores (e.g., cosine similarity) to retrieve relevant candidates from large corpuses. Training involves in-batch negatives and sampled softmax loss with log-Q bias correction to handle popularity bias and improve learning efficiency.
The serving architecture demonstrates a common pattern for integrating complex ML models into production. An offline batch workflow pre-computes top-K relevant advertisers/items for each user, which are then published to an online feature store. During an ad request, the Ads Serving system retrieves these pre-computed candidates. For item-level prediction, approximate nearest neighbor (ANN) graphs are used to efficiently retrieve top K items from billions of candidates, a necessity given the scale. This design decouples heavy model inference from real-time serving, ensuring low latency.
Addressing Popularity Bias
Initial models tended to predict popular items, reducing personalization. Pinterest addressed this by carefully tuning log-Q bias correction parameters in the loss function and introducing a diversity metric to balance performance and novelty in recommendations.
Dealing with sparse features and managing sequence length were critical engineering challenges. High-cardinality ID features from sparse offsite data were aggregated to coarser-level features to improve model learning. Experiments showed diminishing returns for sequence lengths beyond 100, highlighting a trade-off between model expressiveness, data sparsity, and infrastructure costs for serving longer sequences. Future work aims to combine onsite and offsite data, incorporate real-time context, and augment the advertiser pool dynamically, indicating a continuous evolution towards more sophisticated, real-time, and context-aware candidate generation systems.