Datadog Blog·March 18, 2026

Scaling Kubernetes Workloads with Custom Metrics for Optimal Resource Utilization

This article explores how to effectively scale Kubernetes workloads using custom metrics, addressing the limitations of default CPU and memory-based autoscaling. It delves into the benefits of custom metrics for reflecting true application demand and provides architectural insights into implementing Horizontal Pod Autoscaler (HPA) with external metrics sources. This approach enhances resource efficiency and application responsiveness by dynamically adjusting resources based on application-specific performance indicators.

Performance & Scaling Cloud & Infrastructure DevOps & SRE

Read original on Datadog Blog

Kubernetes' Horizontal Pod Autoscaler (HPA) typically relies on CPU and memory utilization to scale workloads. While often sufficient, these default metrics can sometimes be poor indicators of actual application demand, leading to under-provisioning or over-provisioning. For instance, an application might be CPU-idle but experiencing high latency due to an increasing number of concurrent requests or database connections, which are not directly reflected in standard resource metrics.

The Need for Custom Metrics

Custom metrics allow for more intelligent scaling decisions by directly tying autoscaling to application-specific performance indicators. These could include: * Queue Length: For asynchronous processing workloads, the number of items in a message queue. * Requests Per Second (RPS): For API services, the rate of incoming requests. * Active Connections: For database proxies or stateful services. * Business Logic Metrics: Specific metrics relevant to the application's domain, e.g., active users, pending tasks.

💡

Choosing Relevant Custom Metrics

When selecting custom metrics, identify indicators that directly correlate with your application's workload and resource consumption. Metrics that lead rather than lag actual demand are ideal for proactive scaling.

Implementing Custom Metrics with HPA

To integrate custom metrics with HPA, Kubernetes leverages the metrics API. For metrics not exposed by the resource metrics API (which provides CPU/memory), you typically need to set up a custom metrics adapter (e.g., Prometheus adapter, Datadog's Cluster Agent). This adapter collects metrics from your monitoring system and exposes them via the `custom.metrics.k8s.io` or `external.metrics.k8s.io` APIs for HPA to consume.

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_total
      target:
        type: AverageValue
        averageValue: "100m"
  - type: External
    external:
      metric:
        name: datadog_kafka_consumer_lag
        selector:
          matchLabels:
            kafka_topic: my-topic
      target:
        type: AverageValue
        averageValue: "5"

KubernetesHPAAutoscalingCustom MetricsResource ManagementCloud NativeObservabilityDatadog

Comments

Loading comments...

Architecture Design

Design this yourself

Design a scalable API platform hosted on Kubernetes, ensuring optimal resource utilization. Focus on implementing effective autoscaling mechanisms that go beyond default CPU/memory metrics. Specifically, design the integration of the Horizontal Pod Autoscaler (HPA) with custom and external metrics to dynamically scale microservices based on application-specific KPIs like requests per second, queue length, or active user sessions. Detail the architecture for collecting these metrics and making them available to HPA.

Practice Interview

Focus: Kubernetes Horizontal Pod Autoscaler with custom and external metrics

Other design angles

· Design a data processing pipeline on Kubernetes that automatically scales its worker pods based on the backlog of messages in an input queue.· Architect a multi-tenant SaaS application on Kubernetes where individual tenant workloads scale independently based on their unique usage patterns and business-level metrics.· Propose a robust observability stack for a Kubernetes-native application, highlighting how collected metrics can directly inform and drive intelligent autoscaling decisions for various microservices.

Scaling Kubernetes Workloads with Custom Metrics for Optimal Resource Utilization

The Need for Custom Metrics

Implementing Custom Metrics with HPA

Comments

Architecture Design

Related Lessons