Menu
Pinterest Engineering·March 3, 2026

Unifying ML Engagement Models for Ads at Pinterest

Pinterest shares its journey in unifying multiple machine learning models for ads engagement prediction across various surfaces like Home Feed, Search, and Related Pins. The project addressed inefficiencies caused by fragmented model architectures and training pipelines, leading to a consolidated framework that improved iteration velocity, reduced costs, and enhanced model performance through shared learning and surface-specific optimizations.

Read original on Pinterest Engineering

The article details Pinterest's architectural evolution to consolidate three independent, surface-specific ads engagement prediction models into a single, unified framework. Initially, each surface (Home Feed, Search, Related Pins) had its own model, leading to significant operational and modeling inefficiencies like duplicated effort, high maintenance burden, and redundant training costs. The unification aimed to create a more maintainable, efficient, and performant system.

Unification Strategy and Principles

The unification was approached as a major architectural change, guided by three core principles to mitigate risks:

  • Start simple: Establish a pragmatic baseline by merging the strongest existing components across surfaces.
  • Iterate incrementally: Introduce surface-aware modeling (e.g., multi-task heads, surface-specific exports) only after the baseline demonstrates clear value.
  • Maintain operational safety: Design for safe rollout, monitoring, and fast rollback at every step.

Architectural Refinements for Efficiency and Flexibility

Key architectural decisions included merging features and existing modules into a single baseline model. To address increased training and serving costs associated with a larger unified model, Pinterest implemented several efficiency optimizations:

  • Projection Layers: Using DCNv2 to project Transformer outputs into a smaller representation before downstream layers, reducing serving latency while preserving signal.
  • Fused Kernel Embedding & TF32: Improving inference latency and speeding up training.
  • Request-Level Broadcasting: Reducing redundant embedding table lookups by fetching user embeddings once per unique user in a batch and broadcasting them, thereby lowering compute cost. This required careful management of the number of unique users per batch to maintain reliability.

To maintain flexibility and allow for surface-specific nuances within the unified architecture, they introduced surface-specific calibration layers (e.g., view type specific calibration for Home Feed and Search traffic) and a multi-task learning design with surface-specific checkpoint exports. This enabled each surface to adopt the most appropriate architecture while still benefiting from shared representation learning.

ℹ️

System Design Takeaway

When consolidating complex, fragmented systems, an incremental approach is crucial. Start with a simplified baseline, validate its value, and then gradually introduce complexity and optimizations. Balancing shared learning with surface-specific needs often involves architectural patterns like multi-task learning and specialized components that operate within a unified framework, alongside rigorous cost and performance optimizations.

machine learningadsmodel servingdistributed systemsarchitecture evolutioncost optimizationmulti-task learningpinterest

Comments

Loading comments...