The New Stack·May 13, 2026

Architecting Open Data Infrastructure for AI Workloads

This article discusses the architectural shift required for data infrastructure to support the increasing demands of AI agents, which generate significantly more queries than human users. It advocates for an "Open Data Infrastructure" approach to avoid the high costs and inefficiencies of closed data stacks, emphasizing flexible compute engines and consolidated data context.

Databases & Storage AI & ML Infrastructure Performance & Scaling

Read original on The New Stack

The Challenge of AI Agent Workloads

AI agents dramatically increase the volume of queries on data warehouses, often by tenfold or even a hundredfold compared to traditional human-driven analytics. This surge exposes fundamental inefficiencies in existing data architectures, particularly those built as closed ecosystems where all queries route through a single, often expensive, compute path. This "Lamborghini to mow the lawn" scenario highlights the need for more intelligent routing and resource utilization in data infrastructure.

⚠️

The AI Cost Squeeze

Closed data stacks lead to a "triple whammy" for AI workloads: high costs due to all queries using expensive compute, poor AI answers from fragmented data context, and wasted compute from feeding weak context.

Principles of Open Data Infrastructure

The article advocates for an Open Data Infrastructure characterized by:Diverse Compute Engines: Allowing AI agents to route queries to the most cost-effective and appropriate compute engine based on query complexity and cost profiles. This contrasts with monolithic, closed systems.Consolidated Data Context: Ensuring that all relevant data and its context are readily accessible and unified. Fragmented data leads to higher query costs and degraded AI performance.Semantic Discipline: Investing in a well-defined and managed semantic layer to provide clear context for AI agents, preventing misinterpretations and inefficient querying.

Architectural Implications

Architecturally, this implies a move towards decoupled compute and storage, with an orchestration layer capable of understanding query characteristics and dynamically assigning them to different processing engines (e.g., a lightweight engine for simple lookups, a powerful one for complex analytical tasks). This requires robust data governance and interoperability between various data services and tools.

AI agentsdata warehouseopen data infrastructurecompute costdata architecturedata interoperabilitysemantic layer

Comments

Loading comments...

Architecture Design

Design this yourself

Design an open data infrastructure for a large enterprise that supports diverse AI agent workloads, including intelligent query routing to optimize cost and performance. The system must accommodate multiple compute engines, ensure consolidated data context, and integrate a semantic layer for improved AI accuracy and efficiency.

Practice Interview

Focus: data infrastructure for AI workloads with intelligent query routing

Other design angles

· Design a data platform specifically for real-time AI inference, prioritizing low-latency data access and optimized query execution.· Architect a multi-tenant data lake and data warehouse solution that allows different teams to leverage various compute engines for their AI initiatives while maintaining data governance and cost controls.· Design a data mesh architecture that enables independent data product teams to expose data for AI consumption, ensuring semantic consistency and interoperability across the organization.