Pinterest Engineering·May 21, 2026

Pinterest's User-Sequence Platform for ML Recommendations

Pinterest redesigned its user-sequence platform to efficiently generate, enrich, and serve user sequence data for machine learning models. This system addresses critical challenges in freshness, completeness, consistency, and cost-efficiency for data used in ranking, retrieval, and recommendation systems. The core innovation lies in a "one definition, many runtimes" approach, leveraging a shared execution engine and a Lambda architecture for both real-time and batch processing.

AI & ML Infrastructure Distributed Systems Performance & Scaling

Read original on Pinterest Engineering

The article details Pinterest's architectural overhaul of its user-sequence data platform, which is crucial for powering various machine learning models across their platform. User sequences are ordered lists of recent, relevant user events, enriched with additional signals like embeddings and contextual features. These sequences are vital for models that capture temporal behavior, such as Transformers, used in personalized recommendations, search, and ads.

Key Challenges in User-Sequence Data Management

Building a robust user-sequence platform at scale presents several challenges, particularly in a multi-tenant environment supporting numerous teams and models:

Freshness: How quickly new events and enrichments are reflected in sequences for real-time inference.
Completeness: Ensuring late-arriving events, corrections, and backfills are eventually incorporated.
Consistent Enrichment: Maintaining uniform enrichment logic and data alignment between streaming and batch processes, preventing train-serve skew.
Stable Schemas: Providing predictable and versioned schemas for downstream consumers.
Cost-Efficiency: Managing the storage and processing costs associated with large volumes of sequence data.
Operability and Debugging: Making the complex multi-step process easier to monitor and troubleshoot.

Core Architectural Principles and Solutions

ℹ️

One Definition, Many Runtimes

This principle ensures a single, consistent definition for event filtering, enrichment, and sequence assembly. This definition is then applied across different runtimes: real-time indexing, batch indexing/backfill, and online serving. This prevents the common problem of data drift between training and serving systems.

The platform employs a Lambda architecture to reconcile the conflicting demands of data freshness and completeness. This involves distinct paths for real-time updates and batch processing, with a clear merge policy for eventual consistency. Key architectural decisions include:

Configuration-as-Code: Sequence and enrichment definitions are managed as code (Python), enabling faster onboarding, improved reviewability, and clear separation of concerns.
Shared Execution Engine: A central engine processes raw events into enriched records based on configuration. It handles data sources, filtering, featurization, and writing results. Pluggable executors within this engine encapsulate business-specific logic, minimizing code duplication between streaming and batch jobs.
Columnar, Time-Partitioned Storage: Sequence data is stored in a columnar format to allow models to read only necessary fields, optimizing storage and read performance. Time partitioning facilitates efficient writes and targeted scans.

System Components

Ingestion: Supports both streaming (Kafka) for real-time events and batch (data warehouses) for historical data.
Enrichment and Execution Layer: The shared engine applying configured filters, joins, and transforms to raw events.
Real-time Indexer: A streaming job for low-latency updates to a time-versioned online store.
Batch Indexer and Backfill Pipeline: Scheduled jobs for processing historical data and generating intermediate datasets.
Columnar, Time-Partitioned Storage: Where enriched sequence data resides.
Online Serving API: Exposes a clean API for fetching user sequences, performing request-time enrichments, and applying trimming logic for online inference.

This holistic approach enables Pinterest to deliver high-quality, cost-efficient, and consistent user sequence data, critical for the performance and evolution of their ML-driven recommendation and ranking systems.

user sequencesmachine learningdata platformreal-time processingbatch processinglambda architecturedata consistencyrecommendation systems

Comments

Loading comments...

Architecture Design

Design this yourself

Design a scalable user-sequence data platform for a large-scale social media or e-commerce company, similar to Pinterest, that efficiently processes, enriches, and serves user interaction data for real-time machine learning recommendations and ranking models. Your design must support both low-latency online inference and comprehensive batch processing for training and analysis, incorporating principles like "one definition, many runtimes," a shared execution engine, and a Lambda architecture to ensure data freshness, completeness, and consistency.

Practice Interview

Other design angles

· Design the core 'shared execution engine' component of such a platform, detailing its pluggable executor model, configuration-as-code approach, and how it ensures consistency across real-time and batch pipelines.· Architect a cost-efficient storage layer for user sequence data, considering columnar formats, time partitioning, and strategies for managing data lifecycle and access patterns for ML models.· Design a system for real-time feature engineering and serving for an ML-driven recommendation system, focusing on how user sequence data is made available with minimal latency and maximal freshness for online inference.