Datadog Blog·May 14, 2026

Scalable Time Series Forecasting with Foundation Models

This article introduces Toto 2.0, a time series foundation model designed to scale, offering improved forecasting accuracy with increased model size. It highlights the architectural shift towards foundation models in time series analysis, emphasizing their potential for diverse applications and the engineering challenges involved in training and deploying such large-scale models.

AI & ML Infrastructure Performance & Scaling Distributed Systems

Read original on Datadog Blog

The advent of foundation models, traditionally seen in NLP and computer vision, is now extending to time series analysis with Toto 2.0. This represents a significant architectural shift, moving from specialized, bespoke models to a more generalized, scalable approach. For system designers, this implies considering how a single, large model can serve multiple forecasting needs, contrasting with the previous paradigm of maintaining numerous smaller, domain-specific models.

The Scaling Hypothesis in Time Series

A key takeaway from Toto 2.0 is the successful application of the "scaling hypothesis" to time series. This means that as model parameters increase, performance consistently improves, a phenomenon not reliably observed in prior time series models. Architecturally, this necessitates robust infrastructure capable of handling models with billions of parameters, including distributed training frameworks, massive storage for datasets, and efficient inference serving at scale.

ℹ️

Implications for System Design

Integrating a large foundation model like Toto 2.0 requires careful consideration of computational resources (GPUs/TPUs), data pipelines for continuous training, and low-latency serving infrastructure. It shifts the complexity from managing diverse model types to optimizing a singular, powerful model's lifecycle.

Architectural Components for Large Model Deployment

Distributed Training Infrastructure: Leveraging frameworks like PyTorch Distributed or JAX for training models across hundreds or thousands of accelerators.
High-Throughput Data Ingestion: Designing data pipelines that can feed petabytes of diverse time series data to the training system efficiently.
Model Serving with Low Latency: Implementing inference services that can handle high QPS with acceptable latency, often involving techniques like model quantization, batching, and specialized hardware acceleration (e.g., NVIDIA Triton Inference Server).
Observability and Monitoring: Crucial for understanding model performance, drift, and resource utilization in production.

The use of a single "recipe" to train models across various sizes, from 4 million to 2.5 billion parameters, points to a standardized and automated machine learning operations (MLOps) pipeline. This reduces operational overhead and promotes consistency, a critical factor for reliability in large-scale AI systems. System architects must design MLOps platforms that support such a unified training and deployment workflow.

time seriesforecastingfoundation modelsmachine learningscalabilityMLOpsdistributed traininginference

Comments

Loading comments...

Architecture Design

Design this yourself

Design a real-time analytics platform that incorporates a scalable time series forecasting service. This service should leverage a foundation model architecture (similar to Toto 2.0) to provide predictions for a wide variety of metrics, handling diverse data streams, supporting continuous learning, and ensuring low-latency inference for high-volume requests. Consider the data ingestion, distributed training, model management, and API design for serving forecasts to end-users.

Practice Interview

Focus: scalable time series forecasting using a foundation model

Other design angles

· Design the MLOps pipeline specifically for managing the lifecycle of a time series foundation model, from data preparation and distributed training to versioning and deployment across different environments.· Design a system for real-time anomaly detection that integrates directly with the output of a time series forecasting foundation model, focusing on the trade-offs between prediction accuracy and detection latency.· Design a multi-tenant SaaS platform that offers customizable time series forecasting capabilities to different customers, each utilizing the same underlying foundation model but with fine-tuned parameters or specific data inputs.

Scalable Time Series Forecasting with Foundation Models

The Scaling Hypothesis in Time Series

Architectural Components for Large Model Deployment

Comments

Architecture Design

Related Lessons