Menu
Datadog Blog·May 14, 2026

Scalable Time Series Forecasting with Foundation Models

This article introduces Toto 2.0, a time series foundation model designed to scale, offering improved forecasting accuracy with increased model size. It highlights the architectural shift towards foundation models in time series analysis, emphasizing their potential for diverse applications and the engineering challenges involved in training and deploying such large-scale models.

Read original on Datadog Blog

The advent of foundation models, traditionally seen in NLP and computer vision, is now extending to time series analysis with Toto 2.0. This represents a significant architectural shift, moving from specialized, bespoke models to a more generalized, scalable approach. For system designers, this implies considering how a single, large model can serve multiple forecasting needs, contrasting with the previous paradigm of maintaining numerous smaller, domain-specific models.

The Scaling Hypothesis in Time Series

A key takeaway from Toto 2.0 is the successful application of the "scaling hypothesis" to time series. This means that as model parameters increase, performance consistently improves, a phenomenon not reliably observed in prior time series models. Architecturally, this necessitates robust infrastructure capable of handling models with billions of parameters, including distributed training frameworks, massive storage for datasets, and efficient inference serving at scale.

ℹ️

Implications for System Design

Integrating a large foundation model like Toto 2.0 requires careful consideration of computational resources (GPUs/TPUs), data pipelines for continuous training, and low-latency serving infrastructure. It shifts the complexity from managing diverse model types to optimizing a singular, powerful model's lifecycle.

Architectural Components for Large Model Deployment

  • Distributed Training Infrastructure: Leveraging frameworks like PyTorch Distributed or JAX for training models across hundreds or thousands of accelerators.
  • High-Throughput Data Ingestion: Designing data pipelines that can feed petabytes of diverse time series data to the training system efficiently.
  • Model Serving with Low Latency: Implementing inference services that can handle high QPS with acceptable latency, often involving techniques like model quantization, batching, and specialized hardware acceleration (e.g., NVIDIA Triton Inference Server).
  • Observability and Monitoring: Crucial for understanding model performance, drift, and resource utilization in production.

The use of a single "recipe" to train models across various sizes, from 4 million to 2.5 billion parameters, points to a standardized and automated machine learning operations (MLOps) pipeline. This reduces operational overhead and promotes consistency, a critical factor for reliability in large-scale AI systems. System architects must design MLOps platforms that support such a unified training and deployment workflow.

time seriesforecastingfoundation modelsmachine learningscalabilityMLOpsdistributed traininginference

Comments

Loading comments...