Course/Performance & Scalability Patterns/Index Table Pattern

Index Table Pattern

Create secondary indexes as separate tables: enabling efficient queries on non-primary-key fields in NoSQL and sharded databases.

10 min read

The Problem: Querying Without a Primary Key

In relational databases, a secondary index is a first-class feature managed automatically by the database engine. But in NoSQL databases (DynamoDB, Cassandra, Redis) and sharded databases, lookups by non-primary-key attributes are expensive because data is partitioned by the primary key. Fetching a user by email when your primary key is `user_id` requires scanning all partitions.

The Index Table pattern solves this by manually creating a separate table whose primary key is the attribute you want to query. This secondary table stores the mapping from the query attribute to the primary key, enabling O(1) lookups without full table scans.

Index Table Structure

Loading diagram...

The index table maps the secondary key (email) to the primary key (user_id)

python

# Main table: users (primary key = user_id)
# DynamoDB put:
users_table.put_item(Item={
    "user_id": "u-123",
    "email": "alice@example.com",
    "name": "Alice"
})

# Index table: users_by_email (primary key = email)
# Maintained manually alongside the main table
users_by_email_table.put_item(Item={
    "email": "alice@example.com",
    "user_id": "u-123"
})

# Lookup by email:
def get_user_by_email(email: str) -> dict:
    # Step 1: Resolve primary key via index table
    index_row = users_by_email_table.get_item(
        Key={"email": email}
    ).get("Item")
    if not index_row:
        return None
    # Step 2: Fetch full record from main table
    user = users_table.get_item(
        Key={"user_id": index_row["user_id"]}
    ).get("Item")
    return user

Three Index Table Variants

Variant	What It Stores	Trade-off
Key-only index	Secondary key → primary key mapping only	Two reads per lookup; minimal storage overhead
Covering index (denormalized)	Secondary key + all commonly queried attributes	One read per lookup; data duplication; must keep in sync
Composite index	Multi-attribute secondary key (e.g., city + last_name)	Enables compound queries; less flexible for partial filters

Keeping Index Tables Consistent

The index table must always reflect the current state of the main table. Writes become two-phase operations: write to both the main table and the index table. This introduces two consistency challenges:

Atomicity: A crash after updating the main table but before the index table leaves them out of sync. Use a transactional outbox or two-phase commit if your datastore supports it (DynamoDB TransactWriteItems).
Key updates: If the indexed attribute changes (e.g., a user changes their email), you must delete the old index entry and insert a new one. Failure here leaves orphaned index entries.

💡

DynamoDB Global Secondary Indexes

DynamoDB's managed Global Secondary Indexes (GSIs) implement the Index Table pattern automatically. Under the hood, DynamoDB maintains a separate partition for each GSI and asynchronously propagates writes. The trade-off: GSI reads are eventually consistent by default, and GSI write throughput is charged separately.

Index Table vs Full-Text Search

Index tables work well for exact-match and range queries on structured attributes. For full-text search, prefix matching, or fuzzy matching, a dedicated search engine (Elasticsearch, Algolia, Typesense) is more appropriate. The index table and search engine patterns are complementary — many systems use both.

💡

Interview Tip

Index tables come up in interviews whenever you propose a NoSQL database and the interviewer asks 'but how do you look up users by email?'. The answer is: 'I'd maintain a secondary index table mapping email to user_id. Every write updates both tables transactionally.' Then address the consistency concern proactively. This demonstrates that you think about operational correctness, not just the happy path.

Sharding as a Scalability Pattern

Static Content Hosting Pattern