Menu
Dev.to #architecture·May 27, 2026

Rethinking Event Sourcing: Consolidating Events and Aggregates in PostgreSQL

This article presents a crucial system design lesson learned from a CQRS implementation where events and aggregate roots were stored in separate systems (Kafka and PostgreSQL). The initial distributed architecture led to severe performance issues and operational overhead. The authors describe their journey to consolidate events and aggregates into a single PostgreSQL database, leveraging logical replication as an event bus, dramatically improving latency and reducing costs.

Read original on Dev.to #architecture

The Challenge of Distributed Event Sourcing

The initial architecture used a CQRS pattern where events were stored in a dedicated Kafka cluster (event-store) and aggregate roots resided in PostgreSQL. Events were propagated to Kafka using Debezium and then consumed by read-side services to build materialized views. This setup, while aiming for eventual consistency and data loss prevention, introduced significant latency and operational complexity. High request rates led to severe lag in materialized view updates and frequent consumer restarts due to Zookeeper session timeouts, resulting in critical production incidents.

Failed Attempts at Optimization

  1. Kafka Upgrade: Upgrading to Kafka 3.5 with transactional producers yielded only a minor 12% lag reduction, still leaving millions of unprocessed events.
  2. Tiered Consumers: Moving read-side consumers to a tiered architecture with Kubernetes pods in multiple AZs increased CPU steal and p99 read latency to 900ms.
  3. Kafka Connect with JDBC: Switching from Debezium to Kafka Connect with JDBC source introduced 20-second schema validation pauses and exacerbated event lag. Each attempt failed to address the fundamental latency and complexity introduced by crossing multiple databases and networks.

The Architectural Shift: Consolidation and Logical Replication

The team made a decisive move to consolidate: events were moved into the same PostgreSQL cluster as their aggregate roots, stored in `jsonb` columns with a GIN index. Kafka was replaced by PostgreSQL's logical replication slots, feeding a custom Golang service that emitted a compact binary format to internal gRPC streams for downstream consumers. This change transformed the write path into a single round-trip: client → PostgreSQL → replication slot → gRPC. Read-side materialized views now refresh in a fraction of the original time by reading events via a foreign table within the same cluster.

💡

Key Trade-offs and Benefits

The consolidation introduced a dependency on sharding PostgreSQL at 5TB per node and lost Kafka's disk spill-over capability. However, the benefits were substantial: 35% lower p99 latency, elimination of event lag, a 28% reduction in total cost of ownership (by removing Kafka and Debezium), and a significantly simplified monitoring stack. Post-change, p99 write latency dropped from 48ms to 12ms, and p99 read latency from 900ms to 45ms. The only new failure mode, logical replication lag during primary failover, was mitigated with a hot standby using `pg_rewind`.

Lessons Learned for System Design

The core lesson emphasizes that service boundaries and architectural choices, particularly involving distributed systems and separate data stores, must be justified by real data and performance metrics, not by 'cargo-cult architecture.' Starting with a simpler, monolithic approach for event storage within the same database, and then scaling out as performance demands necessitate, can prevent significant technical debt and operational pain. Logical replication can serve as an effective event bus in many scenarios, reducing network hops and simplifying consistency models.

event sourcingCQRSPostgreSQLKafkalogical replicationperformance optimizationarchitectural trade-offsdatabase consolidation

Comments

Loading comments...