This article details Pinterest's architectural evolution of its Text-to-SQL system into an Analytics Agent, designed to handle over 100,000 tables and 2,500 users. It focuses on using unified context-intent embeddings for semantic retrieval and leveraging structural/statistical patterns with governance-aware ranking to generate trustworthy and validated SQL queries from natural language, addressing the complexities of large-scale data warehouses.
Read original on Pinterest EngineeringPinterest tackled the challenge of building a reliable Text-to-SQL system for a massive data warehouse (over 100,000 tables) by moving beyond simple keyword matching. Their solution, an "Analytics Agent," integrates advanced AI and data governance principles to assist analysts in discovering tables, reusing queries, and generating validated SQL. This system represents a significant architectural shift from basic RAG-based approaches to a more sophisticated, intent-driven knowledge retrieval and query generation mechanism.
Before developing the AI agent, Pinterest invested heavily in data governance. They reduced their data warehouse footprint from 400K to 100K tables through a rigorous tiering program (Tier 1 for production-quality, Tier 2 for team-owned, Tier 3 for temporary/legacy). This process, documented in PinCat (based on DataHub), ensured tables had clear ownership, documentation, and quality standards, making the data warehouse manageable and suitable for AI-driven processes. This highlights that robust data governance is a prerequisite for effective AI/ML in large-scale data environments.
The core of Pinterest's Analytics Agent relies on two complementary dimensions to encode analytical knowledge from historical SQL queries:
System Design Takeaway
This architecture demonstrates how combining deep semantic understanding (intent embeddings) with pragmatic, empirically validated structural knowledge (query patterns) and strong data governance leads to a more reliable and scalable Text-to-SQL system. The separation of "what" (intent) from "how" (patterns) is a key design principle.
These two dimensions work together: intent embeddings enable flexible semantic search, while structural patterns provide concrete, validated SQL building blocks. When an analyst asks a question, the system uses intent to retrieve relevant historical queries and then leverages the associated structural patterns and governance signals to construct a highly reliable and performant SQL query.