This article details a real-world scenario where a startup faced severe database performance issues due to inefficient queries on a large PostgreSQL table. It explains how a well-chosen composite index dramatically improved latency for a critical rate-limiting query, shifting from slow sequential scans to efficient index scans. The post highlights the importance of understanding query patterns and index structures for scaling database operations.
Read original on Dev.to #systemdesignA common problem in rapidly scaling applications is database queries becoming performance bottlenecks. The article's case study involved a PostgreSQL `events` table with over 10 million rows, used to track user activities for a rate limiter. The query, `SELECT COUNT(*) FROM events WHERE user_id = $1 AND occurred_at >= NOW() - INTERVAL '1 minute';`, which aimed to count a user's events in the last minute, was executing a sequential scan, leading to high latency (over 2 seconds) and excessive CPU usage as traffic grew.
The key insight was to create a composite index that directly matched the `WHERE` clause's filtering conditions. An index on `(user_id, occurred_at)` allows PostgreSQL to efficiently locate rows for a specific user and then quickly filter those within the defined time window. This approach transforms an `O(N)` sequential scan into an `O(log N)` index scan, drastically reducing the number of disk reads.
CREATE INDEX idx_events_user_occurred ON events (user_id, occurred_at);Index Order Matters
When creating a composite index for queries with both equality and range conditions, place the equality columns first, followed by the range columns. For example, `(user_id, occurred_at)` is superior to `(occurred_at, user_id)` for queries filtering by both, as it allows the database to narrow down results by `user_id` before scanning the time range.
Implementing the composite index reduced query latency from ~2 seconds to under 5 milliseconds. This optimization not only stabilized the rate limiter, enabling it to handle 10,000 requests per second with low CPU usage, but also provided significant benefits across other parts of the system, including faster analytics queries, more efficient batch jobs for event purging, and quicker look-ups for feature flags.