Menu
The New Stack·February 28, 2026

Foundation Models and System Architecture for Physical AI and Robotics

This article explores the crucial role of foundation models in the advancement of physical AI and robotics, emphasizing that software architecture, not just hardware, is the real breakthrough. It delves into various model classes like Large Behavior Models (LBMs), Vision Language Action (VLA) models, and open-world models, and discusses the architectural challenges and trade-offs in deploying these complex AI systems at the edge versus the cloud for real-world applications.

Read original on The New Stack

The evolution of robotics and physical AI is increasingly driven by sophisticated software architectures and advanced foundation models rather than solely hardware innovations. This shift necessitates specialized AI factories for training large, multimodal foundation models, supported by end-to-end software for data processing, orchestration, safety, and model lifecycle management. The core challenge lies in building autonomous systems that can interact with the unpredictable physical world, demanding robust and adaptable software brains.

Key Foundation Model Architectures for Physical AI

Physical AI leverages several classes of models to navigate complex real-world scenarios, each with distinct architectural considerations:

  • Large Behavior Models (LBMs): These models learn from vast human demonstrations for whole-body coordination, obstacle avoidance, and delicate manual work. They use action-chunking for rapid, responsive movement prediction.
  • Vision Language Action (VLA) Models: Designed for sensory reasoning, VLAs process sensor input and language commands into actionable goals. They often run on-board devices at the edge, requiring openness for developer fine-tuning.
  • Open World Models: Crucial for planning and simulation, these models learn environmental dynamics by ingesting multimodal sensor data (cameras, lidar, radar). They are vital for applications like autonomous vehicles but can be computationally expensive, necessitating optimizations like skip-training or latent space prediction.

Architectural Considerations: Edge vs. Cloud

A critical system design decision for physical AI is the distribution of compute between edge devices and cloud infrastructure. Edge processing is paramount for collision-critical decisions and real-time motor control, where internet latency is unacceptable. Cloud infrastructure, conversely, is ideal for pre-training massive foundation models due to its scalability and computational power.

💡

Optimizing for Latency and Safety

When designing physical AI systems, prioritize edge deployment for time-sensitive, safety-critical inference tasks (e.g., motor control, collision avoidance) and leverage cloud infrastructure for computationally intensive model training and less time-sensitive data processing.

Challenges and Future Directions

Scaling physical AI demands a full-stack accelerated computing platform connecting data center supercomputing with real-time AI inference at the edge. Industry-specific variations in latency, mechanics, tasks, and operating environments shape architectural priorities. The trade-off between specialized, edge-deployable models and generalist models capable of broader perception, reasoning, planning, and acting across environments remains a key area of development. Addressing the 'long tail' of unforeseen issues in learned systems and ensuring failure tolerance, especially in high-stakes domains like automotive and healthcare, are significant challenges for system architects.

roboticsphysical AIfoundation modelsedge computingcloud computingmachine learningsystem architectureautonomy

Comments

Loading comments...