Course/Data Storage/Relational Databases & ACID

Relational Databases & ACID

ACID properties, normalization, indexing strategies (B-tree, hash), query optimization, and when relational databases are the right choice.

18 min readHigh interview weight

Why Relational Databases Still Dominate

Relational databases have been the backbone of software systems for over 50 years. Despite the NoSQL wave, the majority of production transactional systems — banking, e-commerce, SaaS — still rely on PostgreSQL, MySQL, or SQL Server. Understanding why requires understanding what relational databases guarantee: the ACID properties.

ACID Properties

ACID stands for Atomicity, Consistency, Isolation, and Durability. These four guarantees define what a *transaction* means in a relational database.

Property	Meaning	Example
Atomicity	All operations in a transaction succeed or all are rolled back. No partial state.	A bank transfer debits one account and credits another — both happen or neither does.
Consistency	A transaction brings the DB from one valid state to another, respecting all constraints.	A `NOT NULL` constraint or foreign key is enforced; invalid data is rejected.
Isolation	Concurrent transactions behave as if they run serially. No dirty reads, no phantoms.	Two users booking the last seat on a flight — only one succeeds.
Durability	Once a transaction commits, the data survives crashes (written to disk via WAL).	After `COMMIT`, even a server crash doesn't lose the order.

Isolation Levels

Full serializable isolation is expensive. SQL databases offer a spectrum of isolation levels that trade correctness for performance. Understanding these is crucial for interviews.

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read
Read Uncommitted	Possible	Possible	Possible
Read Committed (default PG)	Prevented	Possible	Possible
Repeatable Read	Prevented	Prevented	Possible
Serializable	Prevented	Prevented	Prevented

💡

PostgreSQL Default

PostgreSQL defaults to Read Committed isolation. For financial applications requiring strict correctness, explicitly use `SET TRANSACTION ISOLATION LEVEL SERIALIZABLE`.

Indexing Strategies

Indexes are the single biggest lever for query performance in relational databases. Without an index, every query performs a full table scan — O(n) cost. With the right index, lookups become O(log n) via a B-tree.

B-Tree Indexes

The default index type in PostgreSQL and MySQL. B-trees store data in sorted order, enabling range queries (`WHERE created_at > '2024-01-01'`), equality lookups, and ORDER BY operations without an extra sort step. The tree typically has 3-4 levels even for tables with hundreds of millions of rows.

Loading diagram...

B-tree index structure: balanced, sorted, efficient for range and equality queries

Hash Indexes

Hash indexes provide O(1) equality lookups but cannot support range queries. Use them only for exact-match workloads. PostgreSQL supports hash indexes natively; in MySQL, hash indexes exist only in the MEMORY storage engine.

Composite and Covering Indexes

A composite index covers multiple columns. The order matters: an index on `(user_id, created_at)` supports queries filtering on `user_id` alone or `(user_id, created_at)`, but not on `created_at` alone. A covering index includes all columns a query needs, so the database never touches the main table — sometimes called an index-only scan.

sql

-- Composite index: supports WHERE user_id = ? AND created_at > ?
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Covering index: query can be answered entirely from the index
CREATE INDEX idx_orders_covering
ON orders(user_id, created_at)
INCLUDE (total_amount, status);

-- EXPLAIN to verify index usage
EXPLAIN ANALYZE
SELECT total_amount, status
FROM orders
WHERE user_id = 42 AND created_at > '2024-01-01';

Normalization

Normalization eliminates redundancy by decomposing tables into smaller ones and using foreign keys to join them. The standard forms are 1NF, 2NF, 3NF, and BCNF. Most production schemas aim for 3NF: each non-key column depends on the whole primary key, and nothing else.

ℹ️

Normalization vs Denormalization

Normalization improves write consistency but increases join complexity. At scale, teams often denormalize — duplicating data to avoid expensive joins — trading write overhead for read speed. This is a deliberate architectural decision, not an accident.

When to Choose a Relational Database

Complex relationships: Multiple entity types with foreign key constraints and joins (users → orders → products → inventory).
ACID transactions required: Financial operations, inventory management, any case where partial writes are catastrophic.
Flexible ad-hoc queries: You don't know all your access patterns upfront and need SQL's expressiveness.
Regulatory compliance: Auditability, constraints, and referential integrity are enforced by the database engine.
Moderate scale: PostgreSQL handles hundreds of thousands of QPS with proper indexing and connection pooling.

💡

Interview Tip

When an interviewer asks about the data store for a system, always start with: 'What are the ACID requirements?' If strong consistency and relational integrity matter (payments, reservations, user accounts), default to PostgreSQL. Then justify if you deviate from that baseline. Interviewers are more impressed by a thoughtful choice than by defaulting to NoSQL to sound 'scalable'.

Query Optimization Basics

The query optimizer generates an execution plan — the sequence of operations to execute a query. Use `EXPLAIN ANALYZE` to inspect the plan. Key things to look for: Seq Scan (red flag on large tables), Index Scan (good), Hash Join vs Nested Loop (depends on table size), and sort operations that could be eliminated by an index.

⚠️

N+1 Query Problem

The most common ORM-related performance killer. Fetching 100 users and then running 100 separate queries for their profiles is O(n) database round trips. Always use JOINs or eager loading (`SELECT ... JOIN`) instead. Spot this in code reviews and interviews — it's a flag for senior engineers.

Long Polling vs Short Polling

NoSQL Database Types