This article discusses a critical architectural decision in scaling a game server to handle 100k concurrent users, focusing on the interplay between load balancing strategies and application server thread management. It highlights the pitfalls of naive throughput optimization and advocates for a dynamic approach to prevent resource exhaustion and connection timeouts.
Read original on Dev.to #architectureThe core problem addressed was designing a server layer, built with a microservices architecture on Kubernetes, to gracefully handle an anticipated 100k concurrent users at launch with rapid growth. Initial misconfigurations led to severe performance degradation, demonstrating the importance of understanding real-world implications beyond theoretical throughput metrics.
The team initially adopted a "least connections first" load balancing strategy, aiming to optimize for throughput. However, this approach unexpectedly caused application server CPU usage to spike to 100% within minutes, leading to thread starvation and widespread connection timeouts. The load balancer, misinterpreting timeouts as slow responses, compounded the problem by not increasing its connection pool, creating a "hot potato" scenario where the problem was passed back to the struggling application server.
The "Hot Potato" Problem
A load balancer configured with "least connections first" might direct more traffic to a seemingly less busy server. If that server is overloaded but merely *slow* to respond (or timing out), the load balancer won't detect the issue quickly enough and continues to send traffic, exacerbating the problem and preventing recovery.
To resolve the scaling issues, the team switched to a round robin load balancing strategy combined with dynamic thread pool sizing on the application servers. This decision was crucial for two reasons: distributing load evenly across all instances and allowing application servers to self-regulate. By dynamically monitoring CPU usage and adjusting thread counts, servers could throttle incoming connections to prevent resource exhaustion and connection timeouts.
When designing for high concurrency, consider the feedback loop between your load balancer and application servers. A simplistic load balancing strategy without adequate server-side resource management can lead to cascading failures. Dynamic resource allocation (like thread pools) coupled with robust monitoring is key to graceful degradation and scaling.