ByteByteGo·May 27, 2026

Airtable's Semantic Search Architecture for AI Features

Airtable engineered a scalable and performant semantic search system to power its AI features, focusing on handling diverse customer database sizes and multi-tenancy. The architecture leverages Milvus for vector storage and search, with critical design decisions made around data partitioning, index selection, and managing hot/cold data to meet strict latency, throughput, and privacy requirements.

Databases & Storage Distributed Systems Performance & Scaling

Read original on ByteByteGo

Airtable's semantic search system, which underpins AI features like natural language querying and linked record recommendations, faced unique challenges due to the nature of its customer data. Key design priorities included 500ms 99th percentile query latency, high write throughput for constantly changing data, horizontal scalability for millions of independent customer 'bases', and self-hosting for data privacy.

Data Properties and Initial Challenges

The core problem stemmed from three data properties: enormous variation in base sizes (from a few to hundreds of thousands of rows), strict isolation requirements between customer bases, and the observation that 75% of bases are idle most of the time. Embeddings, being roughly ten times larger than their source data, necessitated a dedicated vector storage and search system. Airtable chose Milvus, a vector database, for its self-hosting capability, multi-tenancy support via partitions, and separate scaling of ingestion, indexing, and query components.

Partitioning Strategy for Multi-Tenancy

The primary architectural decision was how to partition customer data. Airtable opted for one partition per base within Milvus. This provided strong physical isolation, simplified data deletion, and avoided the latency overhead of post-query filtering required by shared partitions. However, this led to a performance bottleneck: Milvus experienced significant slowdowns (e.g., 250ms partition creation latency) when a single collection approached 100,000 partitions due to internal bookkeeping overhead.

💡

Hierarchical Capping

To overcome the partition limit, Airtable implemented a hierarchical capping strategy. Each Milvus cluster now contains up to 400 collections, and each collection holds at most 1,000 partitions. This limits any single cluster to 400,000 bases. As the customer base grows, new clusters are provisioned, trading operational complexity for predictable performance. This pattern of introducing another level of grouping to overcome flat namespace limitations is common across distributed systems.

Vector Index Selection and Trade-offs

Selecting the right vector index involved a crucial trade-off between memory, latency, and recall. Airtable benchmarked three types:

HNSW (Hierarchical Navigable Small World): Offers fast lookup, high recall (99-100%), and predictable performance, but is memory-intensive as the entire graph must reside in RAM.
IVF-SQ8: Clusters vectors and compresses them to reduce memory footprint, but introduces approximation error, lowering recall.
DiskANN: Stores most of the index on disk, scaling to enormous datasets per node, but incurs higher query latency due to disk I/O.

Given the 500ms latency target and the need for high recall for AI feature quality, Airtable chose HNSW. This decision pushed the cost to memory, which was addressed by a separate strategy for managing hot and cold data.

Optimizing Memory with Hot/Cold Data Management

To mitigate HNSW's memory demands, Airtable capitalized on customer usage patterns: only about 25% of bases are actively accessed in a given week, while 75% remain idle. Milvus's ability to offload partitions from memory to storage and reload them quickly (within seconds) was critical. This allowed Airtable to keep only 'hot' partitions in memory, pushing 'cold' ones to storage. When an idle base is accessed, its partition is reloaded, resulting in a brief, acceptable warm-up period for the user. This strategy significantly reduced memory costs without sacrificing the chosen index's performance benefits.

vector databasemilvussemantic searchembeddingsmulti-tenancyhorizontal scalingdata partitioningHNSW