Pinterest Engineering·January 28, 2026

Pinterest's Ads Candidate Generation with Behavioral Sequence Modeling

Pinterest implemented an advanced behavioral sequence modeling system to enhance ad candidate generation. This system uses transformer-based models to predict user interactions with advertisers and specific products, significantly improving ad relevance and performance. The architecture involves two-tower models, offline batch processing, and online serving via feature stores and ANN graphs.

AI & ML Infrastructure Distributed Systems Performance & Scaling

Read original on Pinterest Engineering

Pinterest's approach to ad candidate generation highlights a common challenge in large-scale advertising systems: delivering relevant ads in a dynamic user environment where interests evolve rapidly. Traditional targeting methods often fall short in capturing the nuances of user behavior. To overcome this, Pinterest leverages behavioral sequence modeling, focusing on a user's historical offsite behavior to predict future conversions. This strategy moves beyond static profiles to dynamic, real-time intent prediction, a critical aspect of modern personalized systems.

Two-Tower Model Architecture for Prediction

The core of Pinterest's solution is a two-tower model architecture. This design pattern is prevalent in recommendation and ad systems, separating user and item (or advertiser) embeddings. The user tower processes event sequences using a bidirectional transformer, capturing temporal dependencies in user behavior. The advertiser/item tower is typically an MLP, operating on static or learned representations. This separation allows for efficient computation of similarity scores (e.g., cosine similarity) to retrieve relevant candidates from large corpuses. Training involves in-batch negatives and sampled softmax loss with log-Q bias correction to handle popularity bias and improve learning efficiency.

Serving Flow and Online Infrastructure

The serving architecture demonstrates a common pattern for integrating complex ML models into production. An offline batch workflow pre-computes top-K relevant advertisers/items for each user, which are then published to an online feature store. During an ad request, the Ads Serving system retrieves these pre-computed candidates. For item-level prediction, approximate nearest neighbor (ANN) graphs are used to efficiently retrieve top K items from billions of candidates, a necessity given the scale. This design decouples heavy model inference from real-time serving, ensuring low latency.

Offline Inference: Daily batch jobs generate user embeddings and pre-compute candidate lists.
Feature Store: Acts as the central hub for storing and serving pre-computed user and item features.
ANN Graph: Enables efficient similarity search for billions of items based on user embeddings.
Downstream Rankers: Light-weight (L1) and heavy-weight models further refine and score the candidates before auction.

Key Learnings and System Design Trade-offs

💡

Addressing Popularity Bias

Initial models tended to predict popular items, reducing personalization. Pinterest addressed this by carefully tuning log-Q bias correction parameters in the loss function and introducing a diversity metric to balance performance and novelty in recommendations.

Dealing with sparse features and managing sequence length were critical engineering challenges. High-cardinality ID features from sparse offsite data were aggregated to coarser-level features to improve model learning. Experiments showed diminishing returns for sequence lengths beyond 100, highlighting a trade-off between model expressiveness, data sparsity, and infrastructure costs for serving longer sequences. Future work aims to combine onsite and offsite data, incorporate real-time context, and augment the advertiser pool dynamically, indicating a continuous evolution towards more sophisticated, real-time, and context-aware candidate generation systems.

Machine LearningRecommendation SystemsAd ServingTransformer ModelsFeature StoreANNScalabilityData Pipelines

Comments

Loading comments...

Architecture Design

Design this yourself

Design a highly scalable ad candidate generation system for an e-commerce platform that leverages behavioral sequence modeling to predict user-item interactions. The system should incorporate a two-tower model architecture, an offline inference pipeline, an online serving layer using a feature store, and approximate nearest neighbor (ANN) search for efficient retrieval from a catalog of billions of items. Address challenges like popularity bias, data sparsity, and real-time user intent updates.

Focus: ad candidate generation system using behavioral sequence modeling and approximate nearest neighbor search

Other design angles

· Design only the offline ML pipeline for generating user and item embeddings and building the ANN index for an ad system.· Design a real-time feature store and serving layer optimized for low-latency retrieval of ad candidates based on pre-computed embeddings and real-time contextual signals.· Focus on designing the evaluation and monitoring framework for an ad candidate generation system, including metrics for relevance, diversity, and business impact (CPA, conversion volume).