289 discussions in the community
we're designing a notification system for a social platform with around 10 million users, and we're debating between fan-out on write versus fan-out o...
we've got a dead letter queue that currently holds about 50,000 unprocessed messages, and frankly, nobody really looks at them. it's a catch-all for a...
We recently had an incident where a large bulk import job from one of our tenants consumed all available resources, impacting the performance and avai...
I'm really struggling to explain the CAP theorem clearly and concisely in system design interviews. I understand the concept: you can only pick two of...
i've been getting some consistent feedback in mock system design interviews lately: "your design works for the happy path but doesn't handle edge case...
Remote system design interviews have become the norm, and I find the whole whiteboard experience quite clunky. Drawing on a shared digital whiteboard ...
we're hitting a wall with database connections in rds postgres. we have around 50 microservice pods, and each maintains its own connection pool, maybe...
We've really focused on improving our incident response and postmortem process over the last couple of years, and it's paid off significantly. We used...
we're running into complexities with event ordering in kafka. for most of our use cases, partitioning by `userId` gives us good enough ordering guaran...
our cto is pushing to use kafka for literally everything, including sending emails and generating reports, citing standardization benefits. while i ag...
Dealing with hot partitions is a constant battle, especially when you have 'celebrity' users or entities that generate disproportionate traffic. We se...
we're currently doing zero-downtime schema migrations in postgres using a multi-deploy strategy: first, add a nullable column. then, deploy code that ...
we've got this colossal 10-year-old php monolith. it's got no tests, uses outdated dependencies, and frankly, it's terrifying to touch. management is ...
we've started seeing some abuse patterns on our websocket endpoints, with clients opening hundreds of connections or sending thousands of messages per...
How important is cost estimation in system design interviews, particularly for roles that aren't specifically focused on infrastructure finance? Shoul...
I've had a few interview scenarios where I'm asked to design a system I've never personally used or built, like 'design Spotify' or 'design a real-tim...
we're currently running with a redis cache, but even with high hit rates, the 1-2ms network latency for every cache read still adds up, especially for...
we're taking a fresh look at how solid principles apply when you're designing distributed systems at scale. single responsibility principle feels pret...
building a notification system for a social platform with millions of users always brings up the fan-out on write vs. fan-out on read dilemma. we have...
on-call burnout is a serious problem on our team of eight engineers. we're getting 5-10 pages a week, and at least 2-3 of those are outside of busines...