Course/Reliability & Resilience Patterns/Timeout Pattern

Timeout Pattern

Prevent indefinite waiting: connection timeouts, read timeouts, cascading timeout budgets, and combining timeouts with circuit breakers.

8 min read

Why Timeouts Are Non-Negotiable

Without a timeout, a thread waiting for a slow response is a zombie: it consumes a thread-pool slot, a connection, and memory forever (or until the process crashes). In practice, a few dozen zombie threads can fill a thread pool and make the entire service unresponsive. Every network call, database query, and inter-process communication must have an explicit timeout.

⚠️

Default Timeouts Are Usually Too Long

Many HTTP client libraries default to 60–120 s or even no timeout at all (e.g., Python's `requests` library). Always set explicit timeouts: connection timeout (how long to wait for a TCP connection) and read/socket timeout (how long to wait for data after connecting).

Types of Timeouts

Timeout Type	What It Limits	Typical Value
Connection timeout	Time to establish a TCP connection	1–5 s
Read / socket timeout	Time to receive the next byte after connection	5–30 s depending on operation
Request timeout	End-to-end time for the full request/response cycle	Derived from SLO
Idle connection timeout	How long a pooled connection can sit unused	30–90 s (< server-side idle timeout)

Timeout Budgets and Deadline Propagation

In a microservices call chain, each hop consumes part of the overall latency budget. If Service A must respond within 500 ms and it calls Service B and Service C sequentially, Service B's timeout should be ~150 ms and Service C's ~150 ms, leaving ~200 ms for local processing and overhead. This is called a timeout budget or deadline propagation.

Google's gRPC propagates deadlines automatically via the `grpc-timeout` header. Each service passes a reduced deadline downstream. If the overall deadline has already expired by the time a downstream call starts, the call is cancelled immediately rather than starting a doomed request.

Loading diagram...

Deadline propagation: each hop gets a fraction of the overall budget

Setting Timeout Values

Set timeouts based on measured latency percentiles, not guesswork. A common guideline: set the timeout at the p99 latency + a reasonable buffer (e.g., 2×). If your p99 response time is 200 ms, a 500 ms timeout provides headroom for occasional spikes without being so long that failures become expensive. Review and adjust as the system evolves — timeouts calibrated for a healthy service may be too long for a degraded one.

Combining Timeouts with Circuit Breakers

Timeouts and circuit breakers are complementary. A timeout handles a single slow call: it gives up after waiting too long and returns an error. If timeouts become frequent, the circuit breaker detects the pattern and opens, preventing future calls from waiting at all. Configure your circuit breaker's slow-call threshold to match your timeout value so slow calls count as failures in the breaker's statistics.

💡

Interview Tip

Interviewers often ask 'what happens if you don't set a timeout?' — the answer is zombie threads, connection pool exhaustion, and cascading failures. Also mention that after a timeout the server-side work may still be running. Use cancellation tokens (gRPC deadline, HTTP/2 RST_STREAM) to signal the server to stop processing abandoned requests.

Code Example: Configuring Timeouts in Node.js

typescript

import axios from "axios";

const httpClient = axios.create({
  baseURL: "https://api.example.com",
  timeout: 3000, // 3 s total request timeout
  // Note: axios timeout covers connection + read combined
});

// For fine-grained control use undici or the native fetch API:
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 3000);

try {
  const res = await fetch("https://api.example.com/data", {
    signal: controller.signal,
  });
  const data = await res.json();
  return data;
} catch (err) {
  if (err instanceof DOMException && err.name === "AbortError") {
    throw new Error("Request timed out after 3000ms");
  }
  throw err;
} finally {
  clearTimeout(timeoutId);
}

Retry with Exponential Backoff

Throttling Pattern