Dev.to #systemdesign·May 31, 2026

Architectural Decisions for a Production-Ready Multi-Service Backend

This article discusses seven critical architectural decisions made while building a multi-service backend with a low operational cost, focusing on practical trade-offs and capacity planning. It highlights the choices around service decomposition, inter-service communication using gRPC, multi-purpose Redis usage, financial state management, and job processing, emphasizing pragmatic solutions over dogmatic adherence to patterns.

Distributed Systems Performance & Scaling Microservices

Read original on Dev.to #systemdesign

Pragmatic Service Decomposition: Modular Monolith

Instead of a full microservices architecture, the author chose a modular monolith with three independently deployable NestJS applications within a single monorepo: a Giveaway API, a Giftcard API, and a Job Processor. This approach provides independent deployability and separate scaling for distinct bounded domains while avoiding the operational overhead and dependency management complexities of numerous separate repositories, which is crucial for cost-effective scaling at lower capacities.

Robust Inter-Service Communication with gRPC and Persistent Queues

The services communicate using gRPC for its compile-time type enforcement and efficient bidirectional calls over an internal network without auth overhead. A key architectural decision is wrapping gRPC calls with a custom proxy that enqueues them as Bull jobs (persisted to Redis). This ensures call durability across service restarts and leverages Bull's retry policies, enhancing system resilience. A circuit breaker at the proxy level prevents cascading failures by stopping calls to struggling downstream services before they fill the queue.

plaintext

API calls giftcardService.allocateCards(payload) 
  → Proxy intercepts the call 
  → Enqueues GRPC_CALL job to Bull (persisted to Redis) 
  → Awaits job.finished() with a deadline timeout 
  → Job Processor picks up job, executes real gRPC stub 
  → Result returned to the waiting caller

Multi-Purpose Redis for Efficiency and Scalability

A single Redis instance serves four critical roles concurrently, demonstrating efficient resource utilization:

Bull queue backend: For various asynchronous tasks like email, notifications, and event processing, with AOF persistence.
Socket.IO pub/sub adapter: Enables horizontal scaling of WebSocket gateways by broadcasting events through Redis channels.
TTL-based application cache: A `getOrSet` pattern for caching frequently accessed data, gracefully degrading to database calls if Redis is unavailable.
Atomic concurrency control: Using Lua scripts for critical operations like real-time winner selection, ensuring atomicity and preventing race conditions without application-level locking.

Financial Correctness with Escrow State Machines and Saga Compensation

For handling gift card prize escrows, a robust state machine ensures irreversible transitions and uses idempotency keys for all financial operations. A crucial design decision for settlements is performing payment transfers *before* database writes. This ensures that if any transfer fails, the database remains untouched, allowing for safe retries without complex compensation logic. An inline saga compensation pattern is used for initial escrow creation, ensuring that if wallet deductions fail, already reserved escrows are immediately refunded using `Promise.allSettled`.

💡

Order of Operations for Financial Transactions

Perform external financial transfers (e.g., to payment providers) *before* committing internal database changes. This allows for simpler retry logic on failure, as the system state hasn't been inconsistently updated. If transfers succeed, then commit the database. If transfers fail, the database is untouched, and the client can safely retry the entire operation with idempotency.

Decoupled Job Processor Modes for Independent Scaling

The Job Processor runs in two distinct modes: `worker` for background tasks (email, analytics) and `gateway` for Socket.IO WebSocket serving. This decoupling allows each mode to scale independently based on its specific load characteristics (job queue backlog vs. concurrent WebSocket connections), leading to better resource utilization and fault isolation. The API services enqueue jobs for WebSocket events, keeping their HTTP event loops free from persistent connection management overhead.

modular monolithgRPCRedismessage queuesstate machinefinancial systemsscalingfault tolerance

Comments

Loading comments...

Architecture Design

Design this yourself

Design a multi-service backend for a platform managing social media giveaways, gift card marketplaces, and telecom gift vending, incorporating a modular monolith architecture, gRPC for inter-service communication with persistent queues and circuit breakers, and a single Redis instance acting as a job queue, pub/sub adapter, cache, and atomic counter. Detail the financial escrow state machine, emphasizing idempotency and the order of operations for robust settlement, and explain how job processors are decoupled for independent scaling of background tasks and real-time WebSocket capabilities.

Practice Interview

Other design angles