Medium #system-design·May 29, 2026

Choosing Databases Based on Core Data Structures

This article highlights that effective database selection should be driven by understanding the underlying data structures and their operational characteristics rather than marketing hype. It emphasizes that databases are essentially optimized implementations of fundamental data structures, influencing their performance, scalability, and suitability for various use cases.

Databases & Storage Distributed Systems Performance & Scaling

Read original on Medium #system-design

The Essence of Databases: Data Structures

At their core, databases are sophisticated wrappers around fundamental data structures. Understanding these structures – like B-trees, hash tables, LSM-trees, or heaps – is crucial for making informed architectural decisions. Each data structure offers distinct advantages and trade-offs regarding read/write performance, storage efficiency, and consistency models, directly impacting how a database performs under specific workloads.

Key Data Structures and Their Database Applications

B-Trees/B+Trees: Found in traditional relational databases (PostgreSQL, MySQL). Excellent for ordered data, range queries, and indexing, offering balanced read/write performance.
Hash Tables: Powering key-value stores (Redis, Memcached). Ideal for fast lookups by key, offering O(1) average time complexity for reads and writes. Not suitable for range queries.
Log-Structured Merge-Trees (LSM-trees): Used in NoSQL databases (Cassandra, RocksDB). Optimized for write-heavy workloads, appending data sequentially and merging sorted segments in the background. Trades read amplification for write efficiency.
Heaps/Priority Queues: Less common as primary storage but used in specialized databases or for internal indexing and query optimization.

💡

System Design Implication

When designing a system, don't just pick 'SQL' or 'NoSQL'. Dive deeper: Is your workload read-heavy or write-heavy? Do you need strong consistency or eventual consistency? Are range queries critical? Your answers should guide you to a database whose underlying data structures naturally align with these requirements.

Selecting a database without considering its fundamental mechanisms can lead to significant performance bottlenecks, scalability issues, and operational overhead. For instance, using a B-tree-based database for a purely append-only, high-write-throughput log might result in excessive disk I/O and poor cache utilization compared to an LSM-tree-based solution.

Making Informed Database Choices

A robust system design involves matching the application's data access patterns and consistency requirements with the most suitable database technology. This requires understanding not just the marketing features, but the core engineering principles that dictate a database's behavior under load.

data structuresdatabase selectionSQLNoSQLB-treeLSM-treehash tablesystem architecture

Comments

Loading comments...

Architecture Design

Design this yourself

Design a data storage layer for a high-traffic e-commerce platform that processes millions of transactions daily, handles product catalogs, user profiles, and real-time inventory updates. Justify your database choices for different data types (e.g., product details, order history, user sessions) by discussing the underlying data structures and their implications for read/write performance, consistency, and scalability.

Practice Interview

Focus: database selection based on underlying data structures

Other design angles

· Design a data ingestion and analytics pipeline for IoT sensor data, focusing on selecting appropriate databases for raw time-series data storage, aggregated metrics, and metadata, considering the data structures best suited for high-volume writes and complex analytical queries.· Design the backend for a real-time collaborative document editing application, where multiple users can simultaneously modify content. Explain how the choice of data structures within the database impacts conflict resolution, consistency, and responsiveness.· Design the storage for a social media feed service that needs to serve personalized feeds with low latency, manage user posts, comments, and likes. Discuss how different database types, based on their core data structures, could be combined to optimize for both read and write performance.