This article introduces the Lakebase architecture, which re-architects PostgreSQL by disaggregating its WAL and data files into independent, cloud-native services (SafeKeeper and PageServer). This stateless compute approach enhances scalability, durability, and availability. It then extends to LTAP (Lakehouse Transactional and Analytical Processing), unifying transactional and analytical workloads on a single, open columnar data copy, avoiding complex CDC pipelines and maintaining performance for both.
Read original on Hacker NewsTraditional databases like PostgreSQL store the Write-Ahead Log (WAL) and data files on a single machine's local disk. While efficient for write durability (sequential WAL appends) and read performance (direct data file access), this monolithic architecture introduces significant challenges. These include risks of data loss due to misconfiguration or node failure, high costs and complexity for scaling reads (requiring full physical clones), and contention between analytical and transactional workloads on shared hardware resources. All these issues stem from the tight coupling of compute and storage.
Lakebase addresses the monolithic database limitations by making PostgreSQL compute instances stateless. It achieves this by externalizing the WAL and data files into purpose-built, independently scalable cloud services, leveraging durable cloud object storage. This fundamental shift allows PostgreSQL compute to be started, stopped, and replicated freely without data ownership.
LTAP (Lakehouse Transactional and Analytical Processing) extends Lakebase by storing operational data once in open columnar formats (like Parquet on S3) that can be read by both PostgreSQL (via Lakebase) and Lakehouse engines. This allows analytics to run directly on the same fresh data that transactions just wrote, eliminating the need for Change Data Capture (CDC) pipelines, maintaining a second data copy, and preventing analytical queries from slowing down transactional workloads. Unlike traditional HTAP, which attempts to unify both workloads within a single engine, LTAP unifies at the storage layer while keeping specialized, optimized engines for each workload.
Key System Design Takeaway
The core principle of Lakebase and LTAP is disaggregation of compute and storage. This pattern is crucial for achieving elastic scalability, enhanced durability, and improved cost efficiency in cloud-native database architectures. By separating concerns, each component can be optimized and scaled independently, providing greater flexibility than traditional monolithic designs.