Service Mesh (Istio & Envoy)
Transparent infrastructure for microservices: data plane (Envoy sidecars), control plane (Istio), traffic management, mutual TLS, and observability.
The Microservices Network Problem
In a microservices architecture with dozens of services, every service needs to implement the same boilerplate: retries on transient failures, timeouts, circuit breaking to prevent cascade failures, mutual TLS for service authentication, and emitting traces. Doing this in every service — in multiple languages — is expensive and inconsistent. A service mesh extracts this cross-cutting network logic into the infrastructure layer.
Architecture: Data Plane and Control Plane
The data plane consists of Envoy proxy instances running as sidecars in every application Pod. Envoy intercepts all inbound and outbound traffic via iptables rules — the application sends to `localhost` and Envoy handles everything else. The control plane (`Istiod`) pushes configuration to all Envoy instances using the xDS protocol, distributes mTLS certificates, and aggregates telemetry.
Envoy Proxy Capabilities
- Retries with backoff — automatically retry failed requests with configurable limits and jitter
- Timeouts — per-route and per-cluster deadline enforcement
- Circuit breaking — eject unhealthy upstream hosts when error rate exceeds threshold
- Load balancing — round-robin, least-request, random, consistent hashing
- Outlier detection — remove consistently slow or failing hosts from the load-balancing pool
- Rate limiting — local or global rate limiting via the rate limit service
- Distributed tracing — propagates B3 / W3C trace context headers, reports spans to Jaeger/Zipkin
Istio Traffic Management
Istio's `VirtualService` and `DestinationRule` resources give fine-grained traffic control without changing Kubernetes Service or application code.
# Canary: send 10% of traffic to v2, 90% to v1
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
weight: 90
- destination:
host: reviews
subset: v2
weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
outlierDetection:
consecutive5xxErrors: 5
interval: 10s
baseEjectionTime: 30s
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2Mutual TLS (mTLS)
Istiod acts as an internal Certificate Authority (CA). On Pod startup, it issues a short-lived X.509 certificate containing the service's SPIFFE identity (e.g., `spiffe://cluster.local/ns/default/sa/reviews`). Envoy uses this certificate to authenticate and encrypt all inter-service connections. STRICT mTLS mode rejects any plaintext traffic; PERMISSIVE mode allows both (useful during migration).
mTLS enables zero-trust networking
With Istio mTLS in STRICT mode, every service-to-service call is authenticated. You can write AuthorizationPolicy rules that deny all traffic except explicitly allowed service pairs — implementing zero-trust at the network level without any application code.
Service Mesh Trade-offs
| Benefit | Cost |
|---|---|
| Zero-code retries, timeouts, circuit breaking | Added latency (~1–3ms per hop) |
| Automatic mTLS between all services | More CPU and memory per pod |
| Unified observability (traces, metrics) | Complex control plane to operate |
| Fine-grained traffic control (canary, A/B) | Steep learning curve for operators |
Interview Tip
If asked 'how would you implement circuit breaking between microservices?' — mention that in a service mesh like Istio, you configure `outlierDetection` in a `DestinationRule` without any code changes. Without a mesh, you'd use a library like Resilience4j (Java) or Polly (.NET) inside the application. Know both approaches and when each is appropriate.