The New Stack·March 18, 2026

Optimizing Observability Costs and Quality through Telemetry Governance

This article highlights common pitfalls in observability pipelines, where escalating costs are often attributed incorrectly to vendor pricing. It argues that the root cause lies in poor telemetry quality at the source, lack of ownership, and inadequate governance. Effective solutions involve shifting focus from pipeline-level optimizations to improving instrumentation quality, establishing clear ownership, and implementing automated governance practices to ensure telemetry is purposeful and cost-efficient.

DevOps & SRE Performance & Scaling Distributed Systems

Read original on The New Stack

Many organizations face escalating observability costs, often attempting to mitigate them with pipeline-level solutions like sampling or routing data to cheaper tiers. However, this article argues that these methods only address symptoms, not the underlying problem. The true cost drivers are poor telemetry quality at the source, lack of clear ownership for data signals, and a significant "governance gap" in how telemetry is collected and managed across services.

The Governance Gap: Unowned and Uncontrolled Telemetry

A major issue identified is the lack of attribution for telemetry data. Services often generate metrics without a `service.name` attribute, making it impossible to link data to specific teams, products, or cost centers. This absence of ownership leads to unchecked data growth and difficulty in accountability. Furthermore, lax governance can lead to sensitive data leaks (e.g., credentials, PII) into observability pipelines, posing significant security risks that pipeline-level filtering cannot prevent once data is serialized.

⚠️

Treating Symptoms vs. Causes

Switching observability backends or applying sampling will not fix fundamental issues like missing service attribution, sensitive data leaks, or excessive log duplication. These problems originate in the application's instrumentation itself.

Quality at the Source: Designing Purposeful Instrumentation

The article advocates for addressing quality at the source. This means every metric, span, and log should exist for a stated reason, follow established conventions, carry necessary metadata for attribution, and *never* include sensitive information. Auto-instrumentation, while convenient, often generates a large volume of meaningless data (e.g., excessive internal framework calls, repetitive health checks, duplicate logs) that overwhelms the system and contributes to unnecessary costs. Designing observability into the code, rather than letting it "happen to" the code, is crucial.

Implementing Effective Telemetry Governance

Instrumentation Scoring: Quantitatively assess the quality of instrumentation for each service. This shifts the focus from subjective assessments to measurable, trackable metrics, allowing teams to identify and address services falling below quality thresholds.
Automated Review: Implement immediate feedback loops for developers. Problems like unbounded cardinality in metrics or sensitive data in span attributes should be caught during development, not after deployment or during FinOps reviews.
Fleet-wide Visibility: Gain insights into SDK versions, configuration drift, and compliance with semantic conventions across the entire service fleet. This is essential for enforcing governance at scale, especially when many teams contribute to a centralized platform.
PII Detection in Telemetry: Beyond application logic, implement automated detection of Personally Identifiable Information (PII) within telemetry data itself to catch framework-level leaks or unintended captures by auto-instrumentation.

By shifting the focus from simply reducing costs to improving the quality and purposefulness of telemetry, organizations can achieve significant cost savings as a side effect. High observability bills are often a symptom of unexamined and ungoverned data generation, rather than an inherent cost of monitoring.

observabilitytelemetrycost optimizationinstrumentationgovernancemonitoringloggingtracing

Comments

Loading comments...

Architecture Design

Design this yourself

Design a system for managing and enforcing telemetry governance across a microservices architecture of 500+ services. This system should ensure data quality, prevent sensitive data leakage, attribute costs to service owners, and automatically detect non-compliant instrumentation before it impacts production, thereby optimizing observability spend.

Practice Interview

Focus: telemetry governance and observability data quality

Other design angles

· Design a comprehensive observability platform that integrates cost allocation, PII detection, and automated instrumentation quality scoring.· Design a feedback loop mechanism for developers to ensure that all generated telemetry adheres to predefined standards and semantic conventions.· How would you integrate a telemetry governance framework into a CI/CD pipeline to prevent low-quality or non-compliant instrumentation from being deployed?