The New Stack·February 28, 2026

Foundation Models and System Architecture for Physical AI and Robotics

This article explores the crucial role of foundation models in the advancement of physical AI and robotics, emphasizing that software architecture, not just hardware, is the real breakthrough. It delves into various model classes like Large Behavior Models (LBMs), Vision Language Action (VLA) models, and open-world models, and discusses the architectural challenges and trade-offs in deploying these complex AI systems at the edge versus the cloud for real-world applications.

AI & ML Infrastructure Distributed Systems Performance & Scaling

Read original on The New Stack

The evolution of robotics and physical AI is increasingly driven by sophisticated software architectures and advanced foundation models rather than solely hardware innovations. This shift necessitates specialized AI factories for training large, multimodal foundation models, supported by end-to-end software for data processing, orchestration, safety, and model lifecycle management. The core challenge lies in building autonomous systems that can interact with the unpredictable physical world, demanding robust and adaptable software brains.

Key Foundation Model Architectures for Physical AI

Physical AI leverages several classes of models to navigate complex real-world scenarios, each with distinct architectural considerations:

Large Behavior Models (LBMs): These models learn from vast human demonstrations for whole-body coordination, obstacle avoidance, and delicate manual work. They use action-chunking for rapid, responsive movement prediction.
Vision Language Action (VLA) Models: Designed for sensory reasoning, VLAs process sensor input and language commands into actionable goals. They often run on-board devices at the edge, requiring openness for developer fine-tuning.
Open World Models: Crucial for planning and simulation, these models learn environmental dynamics by ingesting multimodal sensor data (cameras, lidar, radar). They are vital for applications like autonomous vehicles but can be computationally expensive, necessitating optimizations like skip-training or latent space prediction.

Architectural Considerations: Edge vs. Cloud

A critical system design decision for physical AI is the distribution of compute between edge devices and cloud infrastructure. Edge processing is paramount for collision-critical decisions and real-time motor control, where internet latency is unacceptable. Cloud infrastructure, conversely, is ideal for pre-training massive foundation models due to its scalability and computational power.

💡

Optimizing for Latency and Safety

When designing physical AI systems, prioritize edge deployment for time-sensitive, safety-critical inference tasks (e.g., motor control, collision avoidance) and leverage cloud infrastructure for computationally intensive model training and less time-sensitive data processing.

Challenges and Future Directions

Scaling physical AI demands a full-stack accelerated computing platform connecting data center supercomputing with real-time AI inference at the edge. Industry-specific variations in latency, mechanics, tasks, and operating environments shape architectural priorities. The trade-off between specialized, edge-deployable models and generalist models capable of broader perception, reasoning, planning, and acting across environments remains a key area of development. Addressing the 'long tail' of unforeseen issues in learned systems and ensuring failure tolerance, especially in high-stakes domains like automotive and healthcare, are significant challenges for system architects.

roboticsphysical AIfoundation modelsedge computingcloud computingmachine learningsystem architectureautonomy

Comments

Loading comments...

Architecture Design

Design this yourself

Design the core software architecture for a commercial physical AI platform that enables humanoid robots to perform complex industrial tasks. Focus on the integration of various foundation models (LBMs, VLAs, open-world models) and architect the data flow and compute distribution between cloud-based training and edge-based real-time inference, ensuring low latency for safety-critical operations and high reliability in dynamic physical environments.

Focus: foundation models for physical AI and robotics with edge-cloud distribution

Other design angles

· Design a scalable infrastructure for training and managing diverse foundation models for a fleet of autonomous vehicles, considering data ingestion from multimodal sensors and continuous model updates.· Architect an edge-first AI system for a surgical robot that uses specialized foundation models for real-time perception, decision-making, and control, emphasizing regulatory compliance and ultra-low latency.· Design a distributed system for a smart warehouse using physical AI robots, focusing on how different model types collaborate, how tasks are assigned, and how data is managed between central orchestration and individual robot autonomy.