MongoDB Blog·April 7, 2026

Predictive Auto-Scaling for Cloud Databases: MongoDB Atlas's Experiment

This article details MongoDB Atlas's experiment in building a predictive auto-scaler to proactively adjust database replica set sizes based on forecasted workload, rather than reacting to existing overload. It explores the architectural components of their prototype, including forecasters and an estimator, and discusses the challenges of predicting customer-driven metrics and CPU utilization for dynamic cloud database environments. The goal is to optimize costs and performance by scaling up before demand spikes, improving upon traditional reactive auto-scaling mechanisms.

Cloud & Infrastructure Distributed Systems Performance & Scaling

Read original on MongoDB Blog

MongoDB Atlas's journey into predictive auto-scaling highlights a critical challenge in cloud infrastructure: efficiently managing dynamic resource allocation. Traditional reactive auto-scaling, while effective, often leads to periods of under- or over-provisioning due to the inherent latency in detecting and responding to load changes. This experiment explores a proactive approach to address these limitations.

The Problem with Reactive Auto-Scaling

Reactive auto-scaling algorithms typically scale resources after an overload or underload condition has been detected. For database services like MongoDB Atlas, this means:

Performance Degradation: Servers can be overloaded for several minutes while the system detects the spike and initiates scaling, impacting user experience.
Cost Inefficiency: Underloaded servers cost customers more than necessary.
Slow Adaptation: Scaling often happens in incremental steps (e.g., M40 to M50, then to M60), requiring multiple operations to reach the optimal size for drastic demand changes, prolonging the impact of sub-optimal sizing.
Scaling Interference: Extreme overload can sometimes interfere with the scaling operation itself.

Architecture of the Predictive Auto-Scaling Prototype

The MongoDB Atlas predictive auto-scaling prototype comprises three main components designed to forecast demand and select optimal instance sizes proactively:

Forecaster: Predicts future workload for each replica set. This is further split into a Long-Term Forecaster (using MSTL for daily/weekly seasonality) and a Short-Term Forecaster (for local trends in non-seasonal workloads). A key architectural decision here is to forecast "customer-driven metrics" (e.g., queries per second) which are assumed to be independent of scaling actions, avoiding circular dependencies.
Estimator: Estimates CPU utilization for any given workload and instance size. This component uses a regression model based on boosted decision trees trained on millions of samples, mapping forecasted demand to expected resource consumption.
Planner: Selects the cheapest instance size that can handle the forecasted demand without exceeding a target CPU utilization (e.g., 75%). This involves evaluating different tier sizes based on the Forecaster's predictions and the Estimator's CPU projections.

💡

Decoupling Forecast from Impact

A crucial system design insight from this work is the choice to forecast customer-driven metrics (like QPS) rather than directly forecasting CPU utilization. Forecasting CPU directly could lead to a feedback loop where scaling actions (which reduce CPU) invalidate the forecast. By predicting metrics exogenous to scaling, the system gains a more stable and reliable prediction input.

Challenges and Learnings

The experiment revealed several key insights for designing such a system:

Predictability Varies: Not all replica sets exhibit predictable patterns. The system needs mechanisms (like a "self-censoring" confidence score for the Long-Term Forecaster) to identify when predictions are reliable.
Short-Term vs. Long-Term: Often, short-term trends are more reliable signals for immediate scaling decisions than daily or weekly cycles, especially given the rapid changes in cloud workloads.
Estimator Accuracy: Achieving high accuracy in the Estimator is challenging due to the opaque nature of customer queries and data. Error rates must be managed, and less predictable workloads might be excluded from predictive scaling.
Conservative Rollout: The production version initially scales up proactively but relies on reactive scaling for scaling down, indicating a cautious approach to deploying complex predictive systems where over-provisioning (cost) is less critical than under-provisioning (performance).

auto-scalingpredictive scalingmachine learningcloud databasesworkload forecastingresource managementMongoDB Atlassystem optimization

Comments

Loading comments...

Architecture Design

Design this yourself

Design a predictive auto-scaling system for a multi-tenant cloud database service, similar to MongoDB Atlas. Your design should include components for workload forecasting (long-term and short-term), resource utilization estimation, and a planning module for instance selection. Focus on how you would handle varying workload predictability, ensure data freshness for forecasts, and manage the trade-offs between cost optimization and performance reliability, especially during rapid workload changes.

Practice Interview

Focus: predictive auto-scaling service for cloud database clusters

Other design angles

· Design a reactive auto-scaling system for a similar cloud database, focusing on fast detection of overload/underload, graceful degradation during scaling operations, and minimizing oscillating scale events.· Design a capacity planning and provisioning system for a data warehouse service, incorporating predictive analytics for future storage and compute needs, considering data growth trends and query patterns.· Focus solely on the 'Estimator' component: Design a machine learning service that can accurately predict CPU utilization for different database workloads across various cloud instance types, given historical performance metrics and instance specifications.