Menu
AWS Architecture Blog·June 29, 2026

Scaling Serverless SaaS to 1 Million AWS Lambda Functions

This article details ProGlove's journey in scaling a multi-account, serverless SaaS platform to over a million AWS Lambda functions. It highlights critical architectural decisions, challenges, and lessons learned across various growth phases, focusing on multi-tenancy, cost optimization, and deployment strategies in a serverless environment.

Read original on AWS Architecture Blog

ProGlove's Insight platform, a serverless SaaS solution, scaled from a few dozen to over a million AWS Lambda functions across thousands of AWS accounts. This journey revealed significant architectural and operational challenges, leading to refined strategies for multi-tenancy, cost management, and deployment automation. The core architecture uses a one AWS account per tenant model, offering strong security, clear ownership, and transparent cost attribution, crucially supporting true scale-to-zero capabilities for dedicated tenant resources.

Microservice Composition and Deployment at Scale

Each microservice adheres to a consistent structure: 5-15 Lambda functions orchestrated by AWS Step Functions, with Amazon EventBridge for event routing and Amazon DynamoDB as the primary data store. These components are bundled into dedicated AWS CloudFormation stacks. Initially, AWS CloudFormation StackSets were leveraged for parallel infrastructure updates across multiple tenant accounts. While effective at smaller scales, StackSets eventually hit performance ceilings at the millions-of-functions mark, prompting consideration of custom deployment engines before AWS CloudFormation service teams addressed the bottlenecks.

Overcoming Scaling Challenges: Quotas, Observability, and "Self-DDoS"

  • Quota Isolation: The one-account-per-tenant model inherently provides robust quota isolation, preventing a "noisy neighbor" from impacting other tenants by exhausting shared service limits (e.g., Lambda concurrency, API Gateway throttles).
  • Automated Account Provisioning: An AWS Organizations-based account factory, driven by AWS Step Functions, automates the creation, baseline security (SCPs), IAM role bootstrapping, and initial CloudFormation StackSet deployment for new tenant accounts, achieving near-zero incremental cost per provisioning run.
  • Request Scattering (Avoiding "Self-DDoS"): A critical lesson learned was to avoid synchronized schedules across thousands of Lambda functions. Using `rate(5 minutes)` caused peak load at the top of the minute across all tenants. The solution involved implementing an internal library that enforces jitter, randomized batch offsets, and staggered updates to scatter requests and prevent self-inflicted DDoS-like scenarios.
  • Cost-Optimized Multi-Account Observability: While manual log access was unfeasible, centralizing CloudWatch logs and metrics through a third-party platform proved costly at scale ($3/account/month becoming substantial). This led to a strategy of differentiating high vs. low priority observability data and only forwarding essential metrics, significantly reducing costs to ~$0.7/account/month.

Rethinking Architectural Patterns for True Scale-to-Zero

Traditional serverless best practices, like using SQS queues between EventBridge and Lambda for resilience, were found to be costly at extreme scale due to continuous polling, even when idle. To achieve true scale-to-zero, SQS was removed from this path, with safety ensured by monitoring `AsyncEventsDropped` and `ConcurrentExecutions`.

ℹ️

Optimizing Dead Letter Queues (DLQs) for Cost

The "centralized DLQ" pattern emerged as a cost-effective alternative to individual DLQs per queue, routing failures from multiple tenants to a single Dead Letter Queue for recovery. This requires stringent discipline to maintain data isolation, treating the AWS account ID as a tenant ID within the converged events.

AWS LambdaServerlessSaaSMulti-tenancyScalabilityCost OptimizationAWS CloudFormationObservability

Comments

Loading comments...