InfoQ Architecture·March 16, 2026

Multi-Cloud Strategy for Payments: Form3's Journey and Trade-offs

This article details Form3's experience in implementing multi-cloud architectures for a high-volume payments platform, driven by regulatory pressure and customer demands. It highlights the technical challenges and architectural decisions made to achieve active/active/active across AWS, GCP, and Azure, including cloud-agnostic technology choices and custom operators for distributed consistency. The article also presents a cautionary tale, illustrating how specific market requirements and latency constraints led them to a simpler active-standby model for the US market, underscoring that multi-cloud is not a universal solution.

Distributed Systems Cloud & Infrastructure Databases & Storage

Read original on InfoQ Architecture

Form3, a UK payments platform processing billions of pounds annually, embarked on a multi-cloud journey in response to regulatory concerns about cloud concentration risk and customer mandates. Their initial architecture was tightly coupled to AWS, but new requirements forced a re-evaluation and a shift towards a more resilient, distributed setup.

Active/Active/Active Multi-Cloud in the UK

For the UK market, Form3 engineered a V2 platform designed for active/active/active operation across AWS, Google Cloud, and Azure. Key architectural decisions included:

Kubernetes: Independent clusters in each cloud.
NATS JetStream: Chosen as a cross-cloud message broker due to its ability to form a single logical cluster spanning multiple environments.
CockroachDB: Selected for distributed data storage, also capable of operating as a single logical cluster across all three clouds.
Go Microservices: Migration from Java to Go for smaller footprints and improved readability.

Engineering Challenges and Solutions

Achieving true multi-cloud consistency presented significant engineering hurdles:

Cross-Cluster DNS: To bootstrap CockroachDB across independent Kubernetes clusters, they developed a custom DNS hack with pseudo-suffixes to route queries between clusters.
Distributed Quorum Protection: A custom operator called XPDB (cross-cluster pod disruption budget) was built to enforce disruption limits across all three clouds, crucial for maintaining CockroachDB quorum during maintenance.
Node Pool Updates: The Cluster Lifecycle Operator was created to consolidate and streamline node pool updates across multiple clouds, environments, and geographies, addressing a painful day-two operational problem.

💡

Key Takeaways for Active/Active Multi-Cloud

Form3's success in the UK relied on three pillars: using cloud-agnostic technology, maintaining single logical data stores across clouds, and treating each cloud provider as an availability zone. This approach enabled them to continue processing payments seamlessly during a major Google Cloud outage.

When Multi-Cloud Doesn't Fit: The US Market Experience

When expanding to the US market, Form3 discovered that their sophisticated triple-active setup was not suitable. US customers prioritized geographical resilience (East Coast primary, West Coast DR) and found the multi-cloud concept unfamiliar. Latency was a critical constraint; spreading CockroachDB quorum across the continent would violate SLAs due to increased write latency. This led to a pragmatic shift towards a simpler active-standby architecture using AWS (East Coast) and GCP (West Coast), relying on backup-and-restore for disaster recovery rather than real-time replication. They are actively working to enhance this with logical replication for CockroachDB and NATS event streams to improve RTOs.

multi-cloudactive-activeactive-standbykubernetescockroachdbnats-jetstreampayments-platformdisaster-recovery

Comments

Loading comments...

Architecture Design

Design this yourself

Design a high-volume, resilient payments platform that can operate across multiple cloud providers to mitigate concentration risk, support global expansion, and meet strict latency requirements. Your design should consider both active/active/active multi-cloud strategies and active/standby regional architectures, detailing data consistency models, cross-cloud communication, and operational challenges like distributed consensus and node management.

Practice Interview

Other design angles

· Design a payments processing system focusing solely on a highly available, regional active/standby architecture with fast failover and data replication strategies across two cloud regions.· Design a cloud-agnostic data layer for a distributed application, specifically detailing how to maintain a single logical view of data across disparate cloud environments with technologies like CockroachDB.· Propose a disaster recovery strategy for a multi-tenant SaaS platform where individual tenants can rehearse failover without impacting others, considering different RTO/RPO objectives for varying customer tiers.