This article discusses the limitations of Kubernetes Horizontal Pod Autoscaler (HPA) for dynamic, latency-sensitive edge workloads and proposes a custom autoscaler (CPA) solution. It highlights how HPA's reactive nature and rigid algorithm lead to inefficiencies at the edge, advocating for a more proactive, multi-signal approach incorporating CPU headroom, latency SLOs, and pod startup compensation to ensure stable performance and efficient resource utilization in constrained edge environments.
Read original on InfoQ CloudKubernetes HPA, while effective in cloud environments, proves insufficient for the unique demands of edge computing. Edge applications require extremely low-latency, high elasticity, and predictable performance under large, unpredictable spikes in workload. Resource constraints at the edge make efficient scaling critical, but HPA's reactive, formulaic approach often leads to over-scaling, under-scaling, or replica oscillation, impacting performance and wasting valuable resources.
To overcome HPA's inflexibility, a Custom Pod Autoscaler (CPA) is proposed, designed to be context-aware and proactive. The CPA's evaluation algorithm leverages best practices from cloud service providers and SRE teams, moving beyond rigid numeric thresholds to use a composite of three primary workload condition signals:
System Design Considerations for Edge Autoscaling
When designing autoscaling for edge applications, prioritize domain-specific metrics, consider the full lifecycle of a pod (including startup time), implement safe scale-down policies with cooldowns to prevent oscillations, and maintain sufficient CPU headroom. Latency SLOs are powerful non-CPU signals for impending overload.