Menu
Back to Discussions

Distributed tracing: OpenTelemetry in practice — the good and the bad

Julia Nguyen
Julia Nguyen
·161 views
we've spent the last six months rolling out OpenTelemetry across roughly 25 core microservices, and it's been a mixed bag, honestly. on the one hand, distributed tracing is absolutely invaluable for debugging complex interactions across our system. pinpointing latency hotspots or understanding failure paths that span multiple services has become significantly easier. on the other hand, the overhead is non-trivial. we're seeing an average 5-10% increase in latency for requests that generate extensive traces. the data volume is enormous too, even with a 1% sampling rate in production. we're spending a lot of time optimizing collector configurations and storage. it feels like a necessary evil, but i'm curious about how other teams are managing the operational challenges and cost of OTel at scale. are there specific strategies or tools you've found effective in mitigating the downsides while maximizing the debugging benefits?
6 comments

Comments

Sign in to join the conversation.

Loading comments...