Pinterest Engineering·March 3, 2026

Unifying ML Engagement Models for Ads at Pinterest

Pinterest shares its journey in unifying multiple machine learning models for ads engagement prediction across various surfaces like Home Feed, Search, and Related Pins. The project addressed inefficiencies caused by fragmented model architectures and training pipelines, leading to a consolidated framework that improved iteration velocity, reduced costs, and enhanced model performance through shared learning and surface-specific optimizations.

AI & ML Infrastructure Microservices Performance & Scaling

Read original on Pinterest Engineering

The article details Pinterest's architectural evolution to consolidate three independent, surface-specific ads engagement prediction models into a single, unified framework. Initially, each surface (Home Feed, Search, Related Pins) had its own model, leading to significant operational and modeling inefficiencies like duplicated effort, high maintenance burden, and redundant training costs. The unification aimed to create a more maintainable, efficient, and performant system.

Unification Strategy and Principles

The unification was approached as a major architectural change, guided by three core principles to mitigate risks:

Start simple: Establish a pragmatic baseline by merging the strongest existing components across surfaces.
Iterate incrementally: Introduce surface-aware modeling (e.g., multi-task heads, surface-specific exports) only after the baseline demonstrates clear value.
Maintain operational safety: Design for safe rollout, monitoring, and fast rollback at every step.

Key architectural decisions included merging features and existing modules into a single baseline model. To address increased training and serving costs associated with a larger unified model, Pinterest implemented several efficiency optimizations:

Projection Layers: Using DCNv2 to project Transformer outputs into a smaller representation before downstream layers, reducing serving latency while preserving signal.
Fused Kernel Embedding & TF32: Improving inference latency and speeding up training.
Request-Level Broadcasting: Reducing redundant embedding table lookups by fetching user embeddings once per unique user in a batch and broadcasting them, thereby lowering compute cost. This required careful management of the number of unique users per batch to maintain reliability.

To maintain flexibility and allow for surface-specific nuances within the unified architecture, they introduced surface-specific calibration layers (e.g., view type specific calibration for Home Feed and Search traffic) and a multi-task learning design with surface-specific checkpoint exports. This enabled each surface to adopt the most appropriate architecture while still benefiting from shared representation learning.

ℹ️

System Design Takeaway

When consolidating complex, fragmented systems, an incremental approach is crucial. Start with a simplified baseline, validate its value, and then gradually introduce complexity and optimizations. Balancing shared learning with surface-specific needs often involves architectural patterns like multi-task learning and specialized components that operate within a unified framework, alongside rigorous cost and performance optimizations.

machine learningadsmodel servingdistributed systemsarchitecture evolutioncost optimizationmulti-task learningpinterest

Comments

Loading comments...

Architecture Design

Design this yourself

Design a scalable machine learning platform for personalized content recommendation and ads engagement prediction across multiple user surfaces (e.g., home feed, search results, related content). Focus on an architecture that unifies core ML models to reduce operational overhead and training costs, while retaining flexibility for surface-specific features and performance tuning, including considerations for low-latency serving and cost-effective inference through techniques like projection layers and request-level broadcasting.

Practice Interview

Focus: unified ads engagement prediction model with multi-task learning and efficiency optimizations

Other design angles

· Design a real-time feature store and serving system that can efficiently provide features for a unified multi-surface ML model, handling high throughput and varying feature requirements across different contexts.· Design an A/B testing framework and deployment pipeline for an ML-driven ad ranking system that supports incremental rollout, quick rollback, and simultaneous experimentation across different model versions and surface-specific calibrations.· Design a system for managing and synchronizing large-scale embedding tables for a unified recommendation and ads ML model, optimizing for memory, retrieval speed, and update frequency across a distributed inference infrastructure.

Unifying ML Engagement Models for Ads at Pinterest

Unification Strategy and Principles

Architectural Refinements for Efficiency and Flexibility

Comments

Architecture Design

Related Lessons