Menu
Dev.to #systemdesign·March 17, 2026

Designing Robust Systems: Differentiating Errors from Exceptions

This article clarifies the crucial distinction between errors and exceptions in software systems, arguing that errors are expected, recoverable outcomes while exceptions signify unexpected, unrecoverable system invariants violations. Properly distinguishing these impacts system reliability, observability, API design, and resilience in distributed environments. Adhering to this principle leads to more robust, maintainable, and predictable software architectures.

Read original on Dev.to #systemdesign

The Foundational Distinction

In building resilient systems, a core principle is understanding how to handle unexpected situations. This article emphasizes that "errors" and "exceptions" serve fundamentally different purposes, and conflating them can lead to significant architectural and operational problems. Errors are expected, recoverable outcomes that are part of normal system behavior, such as a user providing invalid input or a resource not being found. Exceptions are unexpected, unrecoverable violations of system invariants or assumptions, indicating a truly exceptional, often fatal, state.

💡

Practical Rule for System Architects

If the caller can handle it, it's an error. If the system cannot safely proceed, it's an exception. This rule guides the design of resilient systems and clear API contracts.

Impact on System Design and Architecture

  • Reliability: Treating expected errors as exceptions can cause unnecessary system breaks, missed retries, and failed fallback mechanisms. Proper error handling ensures predictable system behavior under common failure scenarios.
  • Observability: Misusing exceptions for routine errors creates noisy logs and obscures genuine system issues, making incident response and debugging significantly harder.
  • API Design: Explicitly modeling errors in API contracts (e.g., returning `User | null` or `(User, error)` tuples) makes interfaces predictable and easier for consumers to reason about. APIs that throw exceptions for expected outcomes break their contract.
  • Distributed Systems Resilience: Distributed environments inherently face network partitions, timeouts, and partial failures. Architectures that treat these expected distributed challenges as exceptions are prone to cascading failures and unpredictable behavior, highlighting the need for robust, explicit error handling.
  • Maintainability and Team Productivity: Clear distinction reduces cognitive load for developers, leading to faster debugging, easier onboarding, and overall more maintainable codebase.

Architectural Implications for Error and Exception Handling

Architecturally, errors should be handled gracefully through explicit returns, discriminated unions, or `Result` types, allowing upstream components to take corrective actions. Exceptions, conversely, should ideally lead to fast failures, immediate logging (e.g., to an error tracking system), and potentially system restarts or circuit breaker activations. This strategy ensures that critical system invariants are preserved and that truly anomalous situations are brought to immediate attention rather than being silently suppressed or mishandled.

typescript
// ✅ Modeling errors explicitly in TypeScript for expected outcomes
function getUser(id: string): User | null {
  return db.find(id) ?? null
}

// ❌ Using exceptions for normal outcomes leads to fragile APIs
function getUserThrowing(id: string): User {
  const user = db.find(id)
  if (!user) throw new Error("User not found")
  return user
}
error handlingexception handlingreliabilityresilienceapi contractssoftware architectureobservabilitymaintainability

Comments

Loading comments...
Designing Robust Systems: Differentiating Errors from Exceptions | SysDesAi