Grab's Analytics Data Warehouse (ADW) team developed a multi-agent AI system to automate engineering support, aiming to reduce repetitive operational tasks and improve resolution efficiency. The system tackles internal requests like SQL debugging and data warehouse troubleshooting, freeing engineers for higher-value development and platform improvement. Key architectural decisions included separating investigation and enhancement workflows, consolidating tools, and integrating safety and context management.
Read original on InfoQ ArchitectureGrab's Analytics Data Warehouse (ADW) team implemented a multi-agent AI system to automate engineering support for its large-scale data platform. This initiative addresses the challenge of significant operational effort consumed by repetitive support tasks within a platform supporting over 1,000 internal users and managing 15,000+ tables. The goal is to shift engineering focus from reactive firefighting to proactive system building and platform improvement.
The system utilizes a multi-agent architecture orchestrated by a LangGraph-based workflow engine and FastAPI services. This setup coordinates routing, tool execution, and state management across specialized agents. Incoming engineering requests are initially classified and then routed to agents responsible for specific tasks such as context retrieval, code search, or solution generation. Each agent operates with constrained responsibilities to enhance clarity and predictability of outputs.
Architectural Principle: Specialization of Agents
The separation of investigation and enhancement paths is a crucial architectural decision that helped reduce complexity in agent reasoning and improved reliability in production workflows. This highlights a common pattern in system design: breaking down complex problems into more manageable, specialized components.
Beyond workflow separation, the team made several other critical design choices and addressed challenges: