This article discusses seven critical architectural decisions made while building a multi-service backend with a low operational cost, focusing on practical trade-offs and capacity planning. It highlights the choices around service decomposition, inter-service communication using gRPC, multi-purpose Redis usage, financial state management, and job processing, emphasizing pragmatic solutions over dogmatic adherence to patterns.
Read original on Dev.to #systemdesignInstead of a full microservices architecture, the author chose a modular monolith with three independently deployable NestJS applications within a single monorepo: a Giveaway API, a Giftcard API, and a Job Processor. This approach provides independent deployability and separate scaling for distinct bounded domains while avoiding the operational overhead and dependency management complexities of numerous separate repositories, which is crucial for cost-effective scaling at lower capacities.
The services communicate using gRPC for its compile-time type enforcement and efficient bidirectional calls over an internal network without auth overhead. A key architectural decision is wrapping gRPC calls with a custom proxy that enqueues them as Bull jobs (persisted to Redis). This ensures call durability across service restarts and leverages Bull's retry policies, enhancing system resilience. A circuit breaker at the proxy level prevents cascading failures by stopping calls to struggling downstream services before they fill the queue.
API calls giftcardService.allocateCards(payload)
→ Proxy intercepts the call
→ Enqueues GRPC_CALL job to Bull (persisted to Redis)
→ Awaits job.finished() with a deadline timeout
→ Job Processor picks up job, executes real gRPC stub
→ Result returned to the waiting callerA single Redis instance serves four critical roles concurrently, demonstrating efficient resource utilization:
For handling gift card prize escrows, a robust state machine ensures irreversible transitions and uses idempotency keys for all financial operations. A crucial design decision for settlements is performing payment transfers *before* database writes. This ensures that if any transfer fails, the database remains untouched, allowing for safe retries without complex compensation logic. An inline saga compensation pattern is used for initial escrow creation, ensuring that if wallet deductions fail, already reserved escrows are immediately refunded using `Promise.allSettled`.
Order of Operations for Financial Transactions
Perform external financial transfers (e.g., to payment providers) *before* committing internal database changes. This allows for simpler retry logic on failure, as the system state hasn't been inconsistently updated. If transfers succeed, then commit the database. If transfers fail, the database is untouched, and the client can safely retry the entire operation with idempotency.
The Job Processor runs in two distinct modes: `worker` for background tasks (email, analytics) and `gateway` for Socket.IO WebSocket serving. This decoupling allows each mode to scale independently based on its specific load characteristics (job queue backlog vs. concurrent WebSocket connections), leading to better resource utilization and fault isolation. The API services enqueue jobs for WebSocket events, keeping their HTTP event loops free from persistent connection management overhead.