Menu
Datadog Blog·June 26, 2026

Optimizing CDN Log Management with Observability Pipelines and Archived Search

This article discusses an architectural approach to managing high-volume CDN logs by routing them to low-cost object storage using Observability Pipelines. It highlights how these archived logs can still be searched and analyzed through tools like Datadog's Archive Search, eliminating the need for separate logging solutions and reducing operational costs. The core system design challenge addressed is balancing log retention, cost efficiency, and searchability for large-scale distributed system logs.

Read original on Datadog Blog

Managing and analyzing logs from Content Delivery Networks (CDNs) presents significant challenges in system design, primarily due to the high volume and velocity of data generated. Traditional logging solutions can become prohibitively expensive when retaining these logs for long periods, yet retaining them is crucial for security, performance monitoring, and compliance. This article explores a strategy to address these challenges by decoupling log ingestion from immediate analysis and leveraging cost-effective object storage.

Architectural Overview: Observability Pipelines for Cost-Effective Log Management

The proposed architecture centers on using an "Observability Pipeline" to intelligently route log data. Instead of sending all high-volume CDN logs directly to an expensive analytics platform for real-time indexing, the pipeline is configured to stream these logs to low-cost object storage (e.g., Amazon S3, Google Cloud Storage) for archival. This approach significantly reduces the cost associated with log ingestion and storage for data that doesn't require immediate, real-time analysis.

💡

Key Design Principle: Data Tiering

This pattern exemplifies data tiering, a fundamental system design principle where data is stored across different storage classes based on its access frequency, performance requirements, and cost. High-value, frequently accessed data resides in fast, expensive storage, while less frequently accessed or archival data moves to slower, cheaper storage. The challenge lies in maintaining accessibility for all tiers.

Ensuring Searchability of Archived Logs

A critical aspect of this design is ensuring that logs stored in object storage remain searchable when needed. This is achieved through an "Archive Search" capability, which allows querying the raw logs in object storage directly, without re-ingesting or re-indexing them into the primary logging solution. This eliminates the need for duplicate tools and complex data movement, streamlining operations and further reducing costs. The search mechanism typically involves indexing metadata or a subset of log fields to quickly locate relevant log segments in the object store, then performing on-demand retrieval and analysis.

  • Ingestion Layer: Collects logs from CDNs (e.g., CloudFront logs).
  • Observability Pipeline: Acts as a router and processor. It can filter, transform, and enrich logs before routing. For CDN logs, it routes to object storage.
  • Object Storage: Cost-effective, highly durable storage for raw log archives (e.g., S3).
  • Archive Search/Analytics Layer: A platform that can query the data directly in object storage, providing a unified search experience.
CDNloggingobservabilitydata pipelinesobject storagecost optimizationdistributed systemsarchive

Comments

Loading comments...