Dropbox Dash utilizes an advanced context engine to unify and search enterprise content across various third-party applications. This system employs an index-based retrieval approach, detailed content understanding, and knowledge graphs to enrich data and enhance search relevance. Key architectural decisions involve choosing pre-processing over federated retrieval and optimizing LLM usage for judging relevance and prompt optimization with DSPy.
Read original on Dropbox TechDropbox Dash addresses the challenge of fragmented enterprise content by building a centralized context engine. This engine ingests data from numerous third-party applications via custom connectors, normalizing and enriching it for unified search and AI-driven queries. The core architectural decision revolves around index-based retrieval, prioritizing pre-processing at ingestion time over on-the-fly federated retrieval for improved performance, data enrichment, and access to company-wide content.
After initial content understanding, Dropbox models relationships between information pieces using knowledge graphs. This cross-app intelligence is vital for providing richer context. For instance, connecting meeting invites to documents, attendees, and project management tasks. A significant insight is the creation of "knowledge bundles" (summaries of graphs that are then indexed) rather than relying solely on traditional graph databases, addressing latency and query pattern challenges. These bundles are processed through the same index pipeline as other content, generating lexical and semantic embeddings.
Index-Based vs. Federated Retrieval
When designing a unified search or context engine, a critical architectural decision is between federated (on-the-fly processing) and index-based (pre-processed at ingestion) retrieval. Index-based retrieval offers faster query times, enriched data, and broader access to content but requires significant upfront engineering effort and robust ingestion pipelines to manage freshness and cost. Federated retrieval is simpler to start but sacrifices performance, comprehensive content access, and sophisticated ranking due to reliance on external API performance and token limitations.
To mitigate challenges with LLM context window limits and slow Multi-hop Context Processing (MCP), Dropbox implemented several optimizations. They introduced "super tools" to consolidate multiple retrieval tools into one, significantly reducing token usage. Knowledge graphs also help by providing concise, relevant information. Furthermore, tool results are stored locally, outside the LLM context window, and sub-agents with narrower toolsets are used for complex queries, selected by a classifier. DSPy is employed for prompt optimization to improve LLM effectiveness.