Timeout Pattern
Prevent indefinite waiting: connection timeouts, read timeouts, cascading timeout budgets, and combining timeouts with circuit breakers.
Why Timeouts Are Non-Negotiable
Without a timeout, a thread waiting for a slow response is a zombie: it consumes a thread-pool slot, a connection, and memory forever (or until the process crashes). In practice, a few dozen zombie threads can fill a thread pool and make the entire service unresponsive. Every network call, database query, and inter-process communication must have an explicit timeout.
Default Timeouts Are Usually Too Long
Many HTTP client libraries default to 60–120 s or even no timeout at all (e.g., Python's `requests` library). Always set explicit timeouts: connection timeout (how long to wait for a TCP connection) and read/socket timeout (how long to wait for data after connecting).
Types of Timeouts
| Timeout Type | What It Limits | Typical Value |
|---|---|---|
| Connection timeout | Time to establish a TCP connection | 1–5 s |
| Read / socket timeout | Time to receive the next byte after connection | 5–30 s depending on operation |
| Request timeout | End-to-end time for the full request/response cycle | Derived from SLO |
| Idle connection timeout | How long a pooled connection can sit unused | 30–90 s (< server-side idle timeout) |
Timeout Budgets and Deadline Propagation
In a microservices call chain, each hop consumes part of the overall latency budget. If Service A must respond within 500 ms and it calls Service B and Service C sequentially, Service B's timeout should be ~150 ms and Service C's ~150 ms, leaving ~200 ms for local processing and overhead. This is called a timeout budget or deadline propagation.
Google's gRPC propagates deadlines automatically via the `grpc-timeout` header. Each service passes a reduced deadline downstream. If the overall deadline has already expired by the time a downstream call starts, the call is cancelled immediately rather than starting a doomed request.
Setting Timeout Values
Set timeouts based on measured latency percentiles, not guesswork. A common guideline: set the timeout at the p99 latency + a reasonable buffer (e.g., 2×). If your p99 response time is 200 ms, a 500 ms timeout provides headroom for occasional spikes without being so long that failures become expensive. Review and adjust as the system evolves — timeouts calibrated for a healthy service may be too long for a degraded one.
Combining Timeouts with Circuit Breakers
Timeouts and circuit breakers are complementary. A timeout handles a single slow call: it gives up after waiting too long and returns an error. If timeouts become frequent, the circuit breaker detects the pattern and opens, preventing future calls from waiting at all. Configure your circuit breaker's slow-call threshold to match your timeout value so slow calls count as failures in the breaker's statistics.
Interview Tip
Interviewers often ask 'what happens if you don't set a timeout?' — the answer is zombie threads, connection pool exhaustion, and cascading failures. Also mention that after a timeout the server-side work may still be running. Use cancellation tokens (gRPC deadline, HTTP/2 RST_STREAM) to signal the server to stop processing abandoned requests.
Code Example: Configuring Timeouts in Node.js
import axios from "axios";
const httpClient = axios.create({
baseURL: "https://api.example.com",
timeout: 3000, // 3 s total request timeout
// Note: axios timeout covers connection + read combined
});
// For fine-grained control use undici or the native fetch API:
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 3000);
try {
const res = await fetch("https://api.example.com/data", {
signal: controller.signal,
});
const data = await res.json();
return data;
} catch (err) {
if (err instanceof DOMException && err.name === "AbortError") {
throw new Error("Request timed out after 3000ms");
}
throw err;
} finally {
clearTimeout(timeoutId);
}