Service mesh (Istio/Linkerd): is the operational overhead justified?

·296 views

we've been looking at adopting a service mesh, specifically Istio or Linkerd, for a while now. we're running about 30+ microservices in our k8s clusters and things are getting unwieldy. we're seeing inconsistent retry/timeout logic across teams, almost no circuit breaking, and mTLS is a pain to manage manually across every service. a service mesh promises to solve a lot of these pain points out of the box. the big concern on our end is the operational overhead. adding a sidecar to every pod means more memory, more CPU, and definitely more latency, even if it's just a few milliseconds. it feels like we're trading one set of problems for another, albeit more standardized ones. what's been your experience with justifying this overhead? are the benefits of standardized traffic management and observability really worth the extra resource consumption and added complexity of managing the mesh itself?

5 comments

Service mesh (Istio/Linkerd): is the operational overhead justified?

Comments