Pinterest engineered an AI-assisted system to measure the prevalence of policy-violating content in real-time. This system uses multimodal LLMs for scalable content labeling and weighted reservoir sampling to ensure statistically unbiased estimates, addressing the limitations of relying solely on user reports. The architecture enables proactive risk detection, data-driven policy adjustments, and efficient resource allocation for Trust & Safety.
Read original on Pinterest EngineeringHistorically, Trust & Safety teams relied heavily on user reports to identify policy-violating content. However, this approach suffers from significant blind spots: under-reported harms (e.g., self-harm due to stigma), malicious actors not reporting content, lack of statistical power for rare categories, and high costs/latency associated with human review at scale. Pinterest needed a system to measure "prevalence" – the percentage of all views on a given day that went to violative content – to provide a more accurate, real-time understanding of platform safety.
Key Design Principle: Decoupling Measurement from Enforcement
The system strategically uses production risk scores during sampling to focus labeling budget on high-risk, high-exposure content. Crucially, the estimator then re-weights these samples using inverse-probability weighting (Hansen–Hurwitz or Horvitz–Thompson ratios) to remove the 'lensing' introduced by the risk scores. This ensures the prevalence statistic accurately reflects impressions and is unbiased, even if enforcement model thresholds or calibrations drift. This decoupling is vital for maintaining measurement integrity and comparability over time.
The team addressed several challenges: rare categories having wide CIs (handled by adapting sampling parameters, stratification, or pooling to weekly data), policy/prompt drift (managed by versioning and backfills), LLM decision quality stability (continuous monitoring, human validation of subsamples, and SME-labeled gold sets), and cost optimization (tracking token usage and exploring multi-step LLM labeling). These considerations highlight the trade-offs between precision, speed, and resource utilization in a large-scale ML-driven system.
The AI-assisted prevalence system provides Pinterest with proactive risk detection, dramatically faster labeling turnaround (15x faster), and significantly lower operational costs. This enables quicker root cause analysis, data-driven policy iteration, strategic decision-making (benchmarking, goal setting, resource allocation), and precise A/B testing of enforcement strategies. Future work includes expanding pivoting capabilities, further cost optimization (e.g., fine-tuning LLMs, multi-step labeling), and human-in-the-loop denoising/debiasing to refine LLM accuracy.