Menu
AWS Architecture Blog·May 12, 2026

Building a Hybrid Multi-Tenant Architecture for Stateful Services on AWS

This article discusses the evolution from a cellular architecture to a hybrid multi-tenant model for stateful services on AWS, primarily focusing on ad-serving infrastructure. It highlights the challenges of scale, efficiency, onboarding time, and the noisy neighbor problem encountered with strict tenant isolation, then presents a solution that balances isolation with operational improvements. The new architecture leverages a three-level hierarchy (tiers, cells, infra groups) and AWS services like Route 53, ALB, ECS, and PrivateLink to achieve cluster-level isolation within shared accounts, significantly reducing operational overhead and improving scalability.

Read original on AWS Architecture Blog

Ad-serving infrastructure often faces the dilemma of balancing strong tenant isolation with operational efficiency, especially for stateful services. The article presents a real-world case study moving from a highly isolated cellular architecture, where each tenant received dedicated AWS accounts and infrastructure, to a more efficient hybrid multi-tenant approach.

Challenges of Strict Cellular Isolation

The initial cellular architecture, while providing excellent isolation, introduced significant operational overhead and inefficiencies. Key problems included:

  • Scale Problem: Supporting only 18 clients across 4 AWS Regions required 181 separate targets, with dedicated AWS accounts, VPCs, load balancers, IAM roles, and downstream connections for each.
  • Efficiency Problem: Servers were largely idle, with average CPU utilization at 3% and memory at 19%, leading to high costs for underutilized infrastructure.
  • Onboarding Problem: Bringing a new client online took approximately 52 days due to extensive manual provisioning of AWS accounts, networking, IAM, and service integrations.
  • Scalability Problem: Horizontal scaling meant spinning up entirely new cells, making it difficult to support concurrent high-traffic events.
  • Noisy Neighbor Problem (despite isolation): For stateful services that load data into memory per tenant, sharing even high-level infrastructure could lead to resource contention (e.g., out-of-memory conditions affecting other tenants in a shared cluster).

Hybrid Multi-Tenant Architecture Solution

The new hybrid architecture aims to provide cluster-level isolation within shared accounts, addressing the previous inefficiencies. It introduces a hierarchical structure and leverages AWS services to streamline operations:

  • Pre-integration Model: Infrastructure components like VPCs, IAM roles, and downstream service connections are established once at the tier level and reused across tenants, making tenant onboarding a configuration-driven process.
  • Tier-Based Architecture: Infrastructure is organized into tiers (e.g., High TPS, Standard TPS, Low TPS) with multiple cells per tier. This enables horizontal scaling without the burden of per-tenant AWS accounts.
  • Three-Level Hierarchy (Tier -> Cell -> Infra Group): This design provides two independent scaling levers to manage different AWS service limits (e.g., ALB target group limits, account ENI limits). A 'tier' is a logical grouping, a 'cell' is an AWS account boundary, and an 'infra group' is a self-contained unit (VPC, ALB, tenant-specific ECS clusters, IAM, monitoring).
  • AWS PrivateLink Connectivity: Shared PrivateLink endpoints for downstream services eliminate the need for per-tenant VPC peering or Transit Gateway connections, reducing network configuration overhead by 80%.
  • Amazon Route 53 Weighted Routing: Used to distribute traffic across ALBs in multiple infra groups/cell accounts, enabling gradual traffic migration and horizontal scale-out without client-side changes.
  • Tenant-Specific ALB Listener Rules: Within an infra group, a single ALB uses listener rules (path-based or header-based) to route requests to the correct dedicated ECS cluster for each tenant.
💡

Key Design Principle

The primary reason for the 80% reduction in infrastructure setup is the architectural decision to pre-wire downstream service dependencies at tier creation, not at tenant onboarding. Once a tier is provisioned with PrivateLink connections to downstream services, all tenants onboarded to that tier automatically inherit full connectivity.

AWSMulti-tenancyECSALBRoute 53PrivateLinkStateful ServicesHybrid Cloud

Comments

Loading comments...