DZone Microservices·March 6, 2026

Kubernetes 1.35: Enhancements for Production Workloads and Distributed Systems

This article explores key features introduced in Kubernetes 1.35, focusing on their impact on system design for production workloads. It details in-place pod vertical scaling for resource efficiency, gang scheduling to prevent deadlocks in distributed AI/ML jobs, and structured authentication for improved API server security and manageability. The hands-on analysis highlights practical implications and trade-offs for architects and developers.

Distributed Systems Performance & Scaling Cloud & Infrastructure

Read original on DZone Microservices

Introduction to Kubernetes 1.35 Enhancements

Kubernetes 1.35 introduces several significant features that address common challenges in deploying and managing complex, distributed workloads. This release focuses on improving resource management, scheduling for tightly coupled applications, and authentication configuration. Understanding these updates is crucial for designing efficient, scalable, and secure Kubernetes-native systems.

In-Place Pod Vertical Scaling (GA): Optimizing Resource Allocation

Previously, scaling pod resources in Kubernetes required a pod restart, leading to downtime and loss of in-memory state. Kubernetes 1.35's in-place vertical scaling, now Generally Available (GA), allows dynamic adjustment of CPU and memory *without* restarting the pod. This feature is particularly beneficial for applications with variable load patterns or high startup costs, such as Java applications or ML inference services.

💡

System Design Impact of In-Place Scaling

Architects can leverage in-place scaling to design more cost-efficient and responsive systems. It enables finer-grained resource management, reducing over-provisioning and ensuring applications can adapt to load changes with zero downtime. However, careful consideration of Kubernetes QoS classes is necessary, as resizing attempts cannot change a pod's QoS class.

Gang Scheduling (Alpha): Solving Distributed Workload Deadlocks

Modern AI/ML and big data workloads often require multiple pods to be scheduled simultaneously to function correctly. Traditional Kubernetes scheduling can lead to "partial scheduling" where some pods run while others wait, wasting resources and causing deadlocks. Gang scheduling, introduced as an Alpha feature in K8s 1.35 (with mature alternatives like scheduler-plugins), addresses this by ensuring an entire group of pods is scheduled atomically – all or nothing.

Problem without gang scheduling: 5 out of 8 required GPU pods schedule, consuming resources but preventing the job from starting.
Solution with gang scheduling: All 8 pods wait until sufficient resources are available, preventing resource waste and enabling smaller jobs to run in the interim.
Production Recommendation: While the native K8s API is Alpha, established solutions like scheduler-plugins (which can work with the default Kubernetes scheduler) provide production-ready gang scheduling capabilities for critical distributed jobs like PyTorch training or Apache Spark.

Structured Authentication Configuration (GA): Enhanced Security and Manageability

Kubernetes 1.35 introduces structured YAML-based configuration for API server authentication. This replaces the cumbersome command-line flag approach, bringing significant benefits for security, maintainability, and auditability. It allows for clearer, schema-validated, and version-controlled authentication policies, simplifying the management of multiple identity providers and reducing configuration errors. This is critical for robust API security and access control in large-scale deployments.

KubernetesContainer OrchestrationResource ManagementAI/MLSchedulingAuthenticationCloud NativeDevOps

Comments

Loading comments...