Polyglot Persistence
Use the best database for each use case within the same system: strategies for managing multiple data stores, data synchronization, and operational complexity.
One Size Does Not Fit All
Polyglot persistence is the practice of using multiple database technologies within a single system, choosing each for the access pattern it serves best. The term was popularized by Martin Fowler and Neal Ford in the context of microservices, where the Database per Service pattern naturally enables each service to choose its own data store.
No single database excels at everything. A relational database is excellent for transactional order data with complex relationships. A document database is better for schema-flexible user profiles. A time-series database is purpose-built for metrics data. A graph database handles traversal of social connections with orders-of-magnitude better performance than an equivalent SQL query. Polyglot persistence embraces this reality.
Database Selection Guide
| Database Type | Best For | Examples | Avoid When |
|---|---|---|---|
| Relational (RDBMS) | ACID transactions, complex relationships, reporting | PostgreSQL, MySQL, CockroachDB | Schema is highly flexible; massive write throughput needed |
| Document | Schema-flexible entities, nested data, catalogs | MongoDB, Firestore, CouchDB | Strong relational joins are needed; ACID across documents |
| Key-Value | Sessions, caches, rate limiting, feature flags | Redis, DynamoDB, Memcached | Complex queries or relationships needed |
| Wide-Column | High write throughput, time-series, IoT | Cassandra, HBase, ScyllaDB | Random access patterns change; ad-hoc queries needed |
| Graph | Social graphs, fraud detection, recommendations | Neo4j, Amazon Neptune | Data is not naturally a graph; simple queries dominate |
| Time-Series | Metrics, logs, events by time | InfluxDB, TimescaleDB, Prometheus | Data is not primarily time-ordered; updates are frequent |
| Search Engine | Full-text search, faceted filtering, geo-search | Elasticsearch, OpenSearch, Solr | Source of truth; strong consistency needed |
Data Synchronization
The main challenge of polyglot persistence is keeping data consistent across multiple stores. When data lives in Postgres (source of truth) but must also be queryable in Elasticsearch (for search), changes in Postgres must propagate to Elasticsearch. The primary tools for this are:
- CDC + Kafka — Debezium tails the Postgres WAL, publishes change events to Kafka, and a consumer updates Elasticsearch. The gold standard for near-real-time synchronization.
- Dual writes — Application writes to both Postgres and Elasticsearch. Simple, but suffers from the dual-write consistency problem. Only acceptable with robust retry and reconciliation.
- Scheduled batch sync — A nightly job exports data from Postgres and imports into the secondary store. Simple but high latency — only suitable for analytics or low-freshness-requirement scenarios.
- Event-driven projection — In a CQRS or event-sourced system, the same events that update the write store are consumed to update the secondary store.
Operational Complexity
Polyglot Persistence Multiplies Operational Burden
Every additional database technology requires your team to learn its operational model: how to configure it, monitor it, back it up, scale it, and recover it from failure. Running Postgres, Redis, MongoDB, Elasticsearch, and Cassandra means five different operational runbooks. Be deliberate — add a new database technology only when the benefit clearly outweighs the operational cost.
Real-World Examples
Uber is a canonical example: trip data in PostgreSQL (transactional), driver location in a custom geo-indexing store, logs in Kafka + ELK, driver profiles in Schemaless (a document store), dispatch decisions in Riak. GitHub uses MySQL for relational data, Redis for caching and sessions, Elasticsearch for code search, and HBase for data analytics. Instagram used Postgres for user data and relationships, Cassandra for feeds at scale, and Redis for caches.
Choosing When to Add a Database
- Is the access pattern fundamentally incompatible with the current database? (e.g., full-text search in Postgres is possible but not efficient at scale)
- Is the existing database struggling under this specific workload and cannot be fixed with indexing/sharding?
- Does the data have a fundamentally different lifecycle (e.g., time-series metrics with TTL-based retention)?
- Is your team prepared to operate the new technology in production?
Interview Tip
In interviews, polyglot persistence most often comes up when designing a system that clearly needs multiple data patterns — for example, a social network needing relational user data, a graph for friend recommendations, full-text search, and a cache for feed timelines. Present each technology choice with a clear rationale tied to the access pattern. Always acknowledge the synchronization and operational complexity trade-offs. Interviewers want to see you make deliberate choices, not just list every database technology you know.