Serverless Architecture
Functions-as-a-Service and managed services: AWS Lambda, cold starts, execution limits, cost models, and when serverless is (and isn't) the answer.
What Is Serverless?
Serverless is a cloud execution model where the cloud provider dynamically manages server infrastructure. Developers deploy functions or application logic; the provider handles provisioning, scaling, patching, and billing — charging only for actual compute time consumed, not for idle servers. The term 'serverless' is a misnomer — there are still servers, but they are fully abstracted away from the developer.
Serverless encompasses two related concepts: Functions-as-a-Service (FaaS) — event-triggered, short-lived compute functions (AWS Lambda, Google Cloud Functions, Azure Functions) — and Backend-as-a-Service (BaaS) — managed services that replace entire backend components (Auth0 for authentication, Firebase for real-time database, AWS DynamoDB for database).
Key Characteristics
| Characteristic | Detail |
|---|---|
| No server management | Provider handles OS, runtime updates, security patches, capacity planning |
| Pay-per-execution | Billed per invocation and per GB-second of compute (not idle time) |
| Auto-scaling | Scales from 0 to thousands of concurrent executions automatically |
| Event-triggered | Functions are invoked by events: HTTP requests, queue messages, file uploads, timer schedules |
| Stateless | Each invocation is independent; state must be stored externally (DynamoDB, Redis, S3) |
| Execution limits | AWS Lambda: 15 min max, 10 GB memory, 512 MB–10 GB ephemeral storage |
Cold Starts
The most significant performance characteristic of FaaS is the cold start: when a function has not been invoked recently, the provider must spin up a new execution environment (download the function package, initialize the runtime, execute any global/module-level code). This adds latency ranging from ~100ms (for Node.js/Python) to several seconds (for JVM-based runtimes like Java or Scala).
Cold Start Mitigation Strategies
1. Use 'warm-up' scheduled pings that invoke the function every few minutes. 2. Enable AWS Lambda Provisioned Concurrency (keeps N instances warm — costs money even at idle). 3. Choose a lightweight runtime (Node.js, Python start faster than Java/Kotlin). 4. Minimize function package size and avoid heavy initialization in the global scope.
When Serverless Excels
- Spiky or unpredictable traffic: A function that handles 0 requests at 3 AM and 50,000 requests during a flash sale costs nothing at 3 AM.
- Event-driven pipelines: Image resizing on S3 upload, log processing from CloudWatch, nightly data exports — triggered, short-lived, perfectly suited.
- Rapid prototyping and MVPs: No infrastructure to set up; focus on business logic.
- Microservices with infrequent invocations: Functions that are called rarely don't need a persistent server consuming resources.
- Webhook handlers: Third-party webhook endpoints (Stripe payment events, GitHub webhooks) are an ideal Lambda use case.
When Serverless Is a Poor Fit
- Long-running processes: Lambda's 15-minute limit makes it unsuitable for video encoding, ML training, or ETL jobs that run for hours.
- Latency-sensitive real-time systems: Cold starts introduce unpredictable latency spikes unacceptable for sub-100ms SLA requirements.
- High-throughput, steady-state workloads: At sustained high request rates, per-invocation billing exceeds the cost of a dedicated EC2 instance.
- Stateful protocols: WebSockets and long-lived connections are awkward — you need AWS API Gateway WebSocket support or a separate WebSocket server.
- Vendor lock-in concerns: Lambda functions tied to AWS-specific triggers (S3 events, DynamoDB streams) are difficult to port to another cloud.
Cost Model Comparison
Cost Break-Even Example
AWS Lambda costs approximately $0.20 per 1M invocations + $0.0000166667 per GB-second. A 512 MB function running for 200ms = 0.1 GB-second = $0.0000016667 per invocation. At 10 million invocations/month, that's about $18.67. A t3.medium EC2 ($0.0416/hr) costs about $30/month at 100% utilization. For 10M requests/month, serverless is cheaper — but at 500M requests/month, a dedicated fleet is significantly cheaper.
Interview Tip
In system design interviews, mention serverless when the problem involves: event-driven triggers (file uploads, queue messages), spiky traffic, or a desire to minimize operational overhead. Always follow up by addressing its limitations: cold starts for latency-sensitive paths, execution time limits for long-running work, and cost at high sustained throughput. This trade-off awareness is what separates strong candidates.