Azure Architecture Blog·May 20, 2026

Achieving System-Level Performance in Azure IaaS Workloads

This article discusses a system-level approach to performance in Azure IaaS, emphasizing that optimal performance is a result of compute, storage, and networking working together, rather than optimizing individual resources. It highlights how Azure engineers the platform to deliver consistent, scalable performance for diverse workloads, including AI, cloud-native, and business-critical systems, by coordinating infrastructure capabilities.

Cloud & Infrastructure Performance & Scaling Distributed Systems

Read original on Azure Architecture Blog

Traditional approaches to cloud performance often involve simply provisioning more resources, like larger VMs or faster disks. However, modern workloads exhibit dynamic bottlenecks, where a system might be constrained by storage at one moment and network bandwidth the next. This necessitates a shift from resource-level optimization to a system-level approach, where performance is an outcome of how compute, storage, and networking interact and are coordinated.

Rethinking Performance in the Cloud

Cloud performance today extends beyond peak speed to encompass consistency, scalability, and responsiveness under real-world conditions. Key dimensions include low tail latency (P99/P99.9) for user experience, high throughput, ability to maintain performance as demand increases (scalability), and predictable performance under load (consistency). Equally important is "time-to-performance," or how quickly infrastructure can be provisioned, scaled, or recovered, which dictates responsiveness to change.

💡

Holistic Performance View

When designing systems, consider performance as a multi-dimensional challenge, not just a raw speed metric. Focus on consistency, scalability, latency (especially tail latency), and the time it takes for your infrastructure to adapt to changes.

Optimizing for Diverse Workloads

AI Workloads: Require massive parallel compute, high-throughput data access, and low-latency communication. Azure uses platform acceleration (e.g., Azure Boost offloading I/O to dedicated hardware) and high-throughput storage (Blob Storage, ADLS) with optimized parallel access. Low-latency, high-bandwidth networking (ExpressRoute) ensures rapid data movement between distributed nodes.
Cloud-Native Applications (e.g., Kubernetes): Demand dynamic scaling. Azure Container Storage enables Kubernetes workloads to use local NVMe disks for sub-millisecond latency. Advanced Container Networking Services (e.g., eBPF host routing in Cilium) improve datapath efficiency for microservices communication.
Business-Critical Systems: Prioritize predictability and reliability. Azure provides consistent compute via purpose-built VMs and intelligent placement (VMSS), tunable storage performance (Ultra Disk, Premium SSD v2 for independent capacity/IOPS/throughput configuration), and reliable low-latency networking (Accelerated Networking, proximity placement groups). Fast recovery is supported by Instant Access Snapshots and Azure Site Recovery.

The Coordinated System Approach

The core message is that performance is not achieved by optimizing isolated components. It relies on how compute, storage, and networking are tailored in tandem for specific workload needs. This coordination helps reduce bottlenecks, ensures improvements in one area are reinforced by others, and simplifies operations by allowing teams to focus on workload design rather than low-level infrastructure tuning. Practical guidance emphasizes balancing throughput for AI, horizontal scaling with Kubernetes-native storage for cloud-native apps, and consistency/predictability for business-critical systems.

AzureIaaSCloud PerformanceScalabilityAI InfrastructureKubernetesStorage PerformanceNetworking

Comments

Loading comments...

Architecture Design

Design this yourself

Design a highly performant and scalable data processing platform on Azure IaaS that can handle bursty AI model training jobs, real-time analytics, and business-critical transactional workloads. Focus on how you would integrate and optimize compute, storage, and networking components to achieve consistent low latency, high throughput, and dynamic scalability while ensuring cost efficiency.

Practice Interview

Focus: integrated compute, storage, and networking for high-performance workloads

Other design angles

· Design a cloud-native microservices platform on AKS leveraging high-performance storage and networking for stateful workloads, emphasizing dynamic scaling and resource utilization.· Design a resilient and highly performant enterprise data warehouse solution on Azure for complex analytical queries and reporting, focusing on predictable performance under varying loads.· Design the infrastructure for a distributed AI training pipeline that requires massive data ingestion and low-latency inter-node communication, detailing the choices for compute, storage, and networking services.