Relational Databases & ACID
ACID properties, normalization, indexing strategies (B-tree, hash), query optimization, and when relational databases are the right choice.
Why Relational Databases Still Dominate
Relational databases have been the backbone of software systems for over 50 years. Despite the NoSQL wave, the majority of production transactional systems — banking, e-commerce, SaaS — still rely on PostgreSQL, MySQL, or SQL Server. Understanding why requires understanding what relational databases guarantee: the ACID properties.
ACID Properties
ACID stands for Atomicity, Consistency, Isolation, and Durability. These four guarantees define what a *transaction* means in a relational database.
| Property | Meaning | Example |
|---|---|---|
| Atomicity | All operations in a transaction succeed or all are rolled back. No partial state. | A bank transfer debits one account and credits another — both happen or neither does. |
| Consistency | A transaction brings the DB from one valid state to another, respecting all constraints. | A `NOT NULL` constraint or foreign key is enforced; invalid data is rejected. |
| Isolation | Concurrent transactions behave as if they run serially. No dirty reads, no phantoms. | Two users booking the last seat on a flight — only one succeeds. |
| Durability | Once a transaction commits, the data survives crashes (written to disk via WAL). | After `COMMIT`, even a server crash doesn't lose the order. |
Isolation Levels
Full serializable isolation is expensive. SQL databases offer a spectrum of isolation levels that trade correctness for performance. Understanding these is crucial for interviews.
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible |
| Read Committed (default PG) | Prevented | Possible | Possible |
| Repeatable Read | Prevented | Prevented | Possible |
| Serializable | Prevented | Prevented | Prevented |
PostgreSQL Default
PostgreSQL defaults to Read Committed isolation. For financial applications requiring strict correctness, explicitly use `SET TRANSACTION ISOLATION LEVEL SERIALIZABLE`.
Indexing Strategies
Indexes are the single biggest lever for query performance in relational databases. Without an index, every query performs a full table scan — O(n) cost. With the right index, lookups become O(log n) via a B-tree.
B-Tree Indexes
The default index type in PostgreSQL and MySQL. B-trees store data in sorted order, enabling range queries (`WHERE created_at > '2024-01-01'`), equality lookups, and ORDER BY operations without an extra sort step. The tree typically has 3-4 levels even for tables with hundreds of millions of rows.
Hash Indexes
Hash indexes provide O(1) equality lookups but cannot support range queries. Use them only for exact-match workloads. PostgreSQL supports hash indexes natively; in MySQL, hash indexes exist only in the MEMORY storage engine.
Composite and Covering Indexes
A composite index covers multiple columns. The order matters: an index on `(user_id, created_at)` supports queries filtering on `user_id` alone or `(user_id, created_at)`, but not on `created_at` alone. A covering index includes all columns a query needs, so the database never touches the main table — sometimes called an index-only scan.
-- Composite index: supports WHERE user_id = ? AND created_at > ?
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Covering index: query can be answered entirely from the index
CREATE INDEX idx_orders_covering
ON orders(user_id, created_at)
INCLUDE (total_amount, status);
-- EXPLAIN to verify index usage
EXPLAIN ANALYZE
SELECT total_amount, status
FROM orders
WHERE user_id = 42 AND created_at > '2024-01-01';Normalization
Normalization eliminates redundancy by decomposing tables into smaller ones and using foreign keys to join them. The standard forms are 1NF, 2NF, 3NF, and BCNF. Most production schemas aim for 3NF: each non-key column depends on the whole primary key, and nothing else.
Normalization vs Denormalization
Normalization improves write consistency but increases join complexity. At scale, teams often denormalize — duplicating data to avoid expensive joins — trading write overhead for read speed. This is a deliberate architectural decision, not an accident.
When to Choose a Relational Database
- Complex relationships: Multiple entity types with foreign key constraints and joins (users → orders → products → inventory).
- ACID transactions required: Financial operations, inventory management, any case where partial writes are catastrophic.
- Flexible ad-hoc queries: You don't know all your access patterns upfront and need SQL's expressiveness.
- Regulatory compliance: Auditability, constraints, and referential integrity are enforced by the database engine.
- Moderate scale: PostgreSQL handles hundreds of thousands of QPS with proper indexing and connection pooling.
Interview Tip
When an interviewer asks about the data store for a system, always start with: 'What are the ACID requirements?' If strong consistency and relational integrity matter (payments, reservations, user accounts), default to PostgreSQL. Then justify if you deviate from that baseline. Interviewers are more impressed by a thoughtful choice than by defaulting to NoSQL to sound 'scalable'.
Query Optimization Basics
The query optimizer generates an execution plan — the sequence of operations to execute a query. Use `EXPLAIN ANALYZE` to inspect the plan. Key things to look for: Seq Scan (red flag on large tables), Index Scan (good), Hash Join vs Nested Loop (depends on table size), and sort operations that could be eliminated by an index.
N+1 Query Problem
The most common ORM-related performance killer. Fetching 100 users and then running 100 separate queries for their profiles is O(n) database round trips. Always use JOINs or eager loading (`SELECT ... JOIN`) instead. Spot this in code reviews and interviews — it's a flag for senior engineers.