Course/Data Management Patterns/Polyglot Persistence

Polyglot Persistence

Use the best database for each use case within the same system: strategies for managing multiple data stores, data synchronization, and operational complexity.

10 min read

One Size Does Not Fit All

Polyglot persistence is the practice of using multiple database technologies within a single system, choosing each for the access pattern it serves best. The term was popularized by Martin Fowler and Neal Ford in the context of microservices, where the Database per Service pattern naturally enables each service to choose its own data store.

No single database excels at everything. A relational database is excellent for transactional order data with complex relationships. A document database is better for schema-flexible user profiles. A time-series database is purpose-built for metrics data. A graph database handles traversal of social connections with orders-of-magnitude better performance than an equivalent SQL query. Polyglot persistence embraces this reality.

Loading diagram...

Polyglot persistence: different databases serve different data access patterns within one system.

Database Selection Guide

Database Type	Best For	Examples	Avoid When
Relational (RDBMS)	ACID transactions, complex relationships, reporting	PostgreSQL, MySQL, CockroachDB	Schema is highly flexible; massive write throughput needed
Document	Schema-flexible entities, nested data, catalogs	MongoDB, Firestore, CouchDB	Strong relational joins are needed; ACID across documents
Key-Value	Sessions, caches, rate limiting, feature flags	Redis, DynamoDB, Memcached	Complex queries or relationships needed
Wide-Column	High write throughput, time-series, IoT	Cassandra, HBase, ScyllaDB	Random access patterns change; ad-hoc queries needed
Graph	Social graphs, fraud detection, recommendations	Neo4j, Amazon Neptune	Data is not naturally a graph; simple queries dominate
Time-Series	Metrics, logs, events by time	InfluxDB, TimescaleDB, Prometheus	Data is not primarily time-ordered; updates are frequent
Search Engine	Full-text search, faceted filtering, geo-search	Elasticsearch, OpenSearch, Solr	Source of truth; strong consistency needed

Data Synchronization

The main challenge of polyglot persistence is keeping data consistent across multiple stores. When data lives in Postgres (source of truth) but must also be queryable in Elasticsearch (for search), changes in Postgres must propagate to Elasticsearch. The primary tools for this are:

CDC + Kafka — Debezium tails the Postgres WAL, publishes change events to Kafka, and a consumer updates Elasticsearch. The gold standard for near-real-time synchronization.
Dual writes — Application writes to both Postgres and Elasticsearch. Simple, but suffers from the dual-write consistency problem. Only acceptable with robust retry and reconciliation.
Scheduled batch sync — A nightly job exports data from Postgres and imports into the secondary store. Simple but high latency — only suitable for analytics or low-freshness-requirement scenarios.
Event-driven projection — In a CQRS or event-sourced system, the same events that update the write store are consumed to update the secondary store.

Operational Complexity

⚠️

Polyglot Persistence Multiplies Operational Burden

Every additional database technology requires your team to learn its operational model: how to configure it, monitor it, back it up, scale it, and recover it from failure. Running Postgres, Redis, MongoDB, Elasticsearch, and Cassandra means five different operational runbooks. Be deliberate — add a new database technology only when the benefit clearly outweighs the operational cost.

Real-World Examples

Uber is a canonical example: trip data in PostgreSQL (transactional), driver location in a custom geo-indexing store, logs in Kafka + ELK, driver profiles in Schemaless (a document store), dispatch decisions in Riak. GitHub uses MySQL for relational data, Redis for caching and sessions, Elasticsearch for code search, and HBase for data analytics. Instagram used Postgres for user data and relationships, Cassandra for feeds at scale, and Redis for caches.

Choosing When to Add a Database

Is the access pattern fundamentally incompatible with the current database? (e.g., full-text search in Postgres is possible but not efficient at scale)
Is the existing database struggling under this specific workload and cannot be fixed with indexing/sharding?
Does the data have a fundamentally different lifecycle (e.g., time-series metrics with TTL-based retention)?
Is your team prepared to operate the new technology in production?

💡

Interview Tip

In interviews, polyglot persistence most often comes up when designing a system that clearly needs multiple data patterns — for example, a social network needing relational user data, a graph for friend recommendations, full-text search, and a cache for feed timelines. Present each technology choice with a clear rationale tied to the access pattern. Always acknowledge the synchronization and operational complexity trade-offs. Interviewers want to see you make deliberate choices, not just list every database technology you know.

Change Data Capture (CDC)

Strangler Fig Pattern