58 discussions in the community
we're designing a notification system for a social platform with around 10 million users, and we're debating between fan-out on write versus fan-out o...
We recently had an incident where a large bulk import job from one of our tenants consumed all available resources, impacting the performance and avai...
we're hitting a wall with database connections in rds postgres. we have around 50 microservice pods, and each maintains its own connection pool, maybe...
Dealing with hot partitions is a constant battle, especially when you have 'celebrity' users or entities that generate disproportionate traffic. We se...
building a notification system for a social platform with millions of users always brings up the fan-out on write vs. fan-out on read dilemma. we have...
in our microservices setup, we preach independent scaling and deployment. but in practice, i've noticed that scaling up one service often requires sca...
we're trying to build out a robust presence system, something like what slack has, where you can see who's online and actively using the app. it sound...
everyone preaches independent microservice scaling, and i get it in theory. but in practice, at least for us, it rarely feels truly independent. if ou...
we're looking to build a slack-like presence system to show which users are currently online. the main challenges are scaling to 500k+ concurrent user...
I'm curious about the current relevance and practical applications of the Actor model (like Akka or Orleans) for building high-concurrency, distribute...
we're growing fast and our microservice count is now over 200, each maintaining its own database connection pool. we're frequently hitting postgresql ...
We're running Redis Sentinel today for high availability for our caching layer, handling about 50,000 operations per second. We're anticipating traffi...
we're seeing incredible growth in our real-time features, and we're now approaching 500k concurrent websocket connections to our existing gateway. it'...
we recently had a major outage where one tenant's large data import operation consumed all available database connections, effectively taking down the...
we're a startup currently handling around 5k requests per second, and we're debating our next scaling move. our current service is running on 2 medium...
we're building a feed service with dynamodb and running into the classic 'hot partition' problem for our celebrity users. our partition key is `userid...
we recently had an incident where a single tenant performing a massive data import effectively starved resources for all other tenants in a multi-tena...
When doing back-of-the-envelope calculations for system design, I often wonder how accurate they really need to be. Is it enough to be in the right or...
we're dealing with a rapidly growing number of websocket connections, currently around 500k concurrent users, and our single websocket gateway is beco...
we're building a slack-like presence system, needing to know who's online and propagate those changes quickly to hundreds of thousands of users. the c...