Menu
Dev.to #systemdesign·June 8, 2026

Elasticsearch TSDS for Live Time-Series Ingestion at Scale

This article explores the internal workings of Elasticsearch Time Series Data Streams (TSDS) during high-volume live ingestion. It delves into how Elasticsearch optimizes for time-series workloads, contrasting TSDS-native approaches with date-based index patterns, and explaining the critical role of Index Lifecycle Management (ILM) in managing data through hot, warm, and cold phases for efficient storage and cost control.

Read original on Dev.to #systemdesign

Understanding Elasticsearch TSDS for Telemetry

Elasticsearch's Time Series Data Streams (TSDS) provide a specialized architecture for handling high-volume, append-heavy telemetry data. Unlike generic indices, TSDS optimizes storage and query patterns specifically for time-series data, which typically involves continuous data arrival, forward-moving timestamps, and aggregation-heavy historical queries. This specialization allows for more efficient resource utilization and better performance for monitoring and analytics platforms.

Two Common Ingestion Approaches

  • TSDS-Native Model: Applications write into a single data stream (e.g., `collector-metrics`). Elasticsearch automatically manages backing indices, rollover, timestamp windows, and write routing, abstracting away underlying storage organization.
  • Operational Date-Based Model: Existing systems often use date-based index patterns (e.g., `collector-metrics-2026-05-21`) due to migration constraints, existing pipelines, and retention workflows. While seemingly an anti-pattern, this is often a pragmatic choice during large-scale migrations.

The Role of Index Lifecycle Management (ILM)

ILM is a crucial component in TSDS, defining how indices evolve throughout their lifetime. It orchestrates automated actions based on data age, size, or number of documents, enabling efficient data retention and cost management across different storage tiers. This is particularly vital for telemetry platforms ingesting terabytes of data daily, where retaining raw granularity across all data becomes prohibitively expensive.

ILM PhaseCharacteristicsTypical Actions
  • Hot Phase: Newly arriving data, optimized for fast writes and low-latency queries. Supports dashboards, alerts, and real-time monitoring.
  • Warm Phase: Older data, where downsampling often begins. Elasticsearch reorganizes segments and aggregates metrics to reduce granularity and storage footprint.
  • Cold Phase: Infrequently accessed historical data, moved to cheaper, slower storage tiers (e.g., object storage).
  • Delete Phase: Oldest data is automatically deleted based on retention policies.

TSDS Architecture Components

  • ILM Policy: Defines rollover rules, phase transitions (hot, warm, cold, delete), and downsampling schedules.
  • Index Template: Specifies settings and mappings for the backing indices created by the data stream, including dimension fields.
  • Data Stream: The logical entry point for ingestion, routing documents to appropriate backing indices based on timestamps and ILM.
  • Live Ingestion Pipeline: External components (producers, queues, worker services) that continuously generate and bulk ingest telemetry data into the TSDS.
💡

Bulk Ingestion for Telemetry

Bulk ingestion is critical for telemetry workloads due to their append-heavy nature. Batching thousands of documents significantly improves efficiency compared to single-document writes, allowing Elasticsearch to handle routing, segment creation, and lifecycle coordination effectively.

ElasticsearchTime SeriesTSDSData IngestionILMTelemetryObservabilityScaling

Comments

Loading comments...