Menu

Blue-Green Deployment

Run two identical production environments: instant rollback, zero-downtime deployment, database migration challenges, and traffic switching strategies.

10 min readHigh interview weight

What Is Blue-Green Deployment?

Blue-green deployment maintains two identical production environments — called Blue and Green — at all times. At any given moment, only one environment is live (serving production traffic), while the other is idle and available for the next deployment. You deploy the new version to the idle environment, run tests, then flip a router or load balancer to send 100% of traffic to the new environment in a single, near-instant cut.

The key insight is that the traffic switch is instantaneous and reversible. If the new version has a critical bug, you flip the router back in seconds — no re-deploy, no rollback script, no waiting for containers to spin up.

Loading diagram...
Before the switch: Blue is live, Green holds the new version awaiting validation.

Deployment Flow Step by Step

  1. Identify the idle environment. If Blue is currently live, you deploy to Green.
  2. Deploy the new version to Green. This is a full deploy: new container images, updated config, migrations (if forward-compatible).
  3. Run smoke tests and health checks against Green. Because Green is not receiving traffic, you can test safely with production infrastructure and data.
  4. Switch the router. Update the load balancer, DNS weighted routing, or service mesh virtual service to direct 100% of traffic to Green.
  5. Monitor the new environment closely for the first 15–30 minutes.
  6. Keep Blue warm. Do not decommission Blue immediately. It remains your instant rollback target.
  7. After confidence is established, Blue becomes the new idle environment for the next cycle.

Traffic Switching Mechanisms

MechanismHow It WorksCutover SpeedRollback Speed
DNS TTL flipUpdate DNS A record to Green's IPMinutes (TTL-dependent)Minutes
Load balancer target swapChange ALB/NLB target groupSecondsSeconds
Service mesh virtual serviceUpdate Istio VirtualService weightsSub-secondSub-second
Feature flag / routing headerRoute based on header or flagInstantInstant
⚠️

DNS TTL Trap

Switching via DNS is the slowest and trickiest option. DNS records are cached by ISPs and clients based on the TTL value. Even if you set TTL to 60 seconds, some resolvers ignore low TTLs. For true zero-downtime cutover, use load balancer target group swaps or a service mesh — not DNS.

Database Migration: The Hard Part

Blue-green deployment works beautifully for stateless application tiers. The hard part is database schema changes. Both environments typically share a single database (or replicated cluster). If v2.0 requires a schema change that is incompatible with v1.0 — such as dropping a column v1.0 still reads — you cannot flip traffic back without breaking the old environment.

The solution is the expand-contract pattern (also called parallel change or multi-phase migration):

  1. Expand: Deploy v2.0 with a migration that only *adds* new columns or tables. v1.0 still runs fine — it ignores new columns.
  2. Migrate data: Backfill data into the new schema while both environments can run.
  3. Cut over: Switch traffic to v2.0.
  4. Contract: After v2.0 is fully validated and v1.0 is decommissioned, run a second migration to drop the old columns/tables.
ℹ️

Separate Database Environments

Some teams run completely separate databases per environment. This eliminates the migration conflict problem but requires synchronizing data before the cut — often done with change data capture (CDC) replication from Blue's database to Green's during the deployment window.

Costs and Trade-offs

DimensionBlue-Green AdvantageBlue-Green Disadvantage
Rollback speedInstant (seconds)
DowntimeZero downtime during switch
Infrastructure costDoubles your compute bill at all times
Database migrationsRequires expand-contract discipline
Stateful sessionsIn-flight sessions may drop at cut-over
Testing fidelityTest against real infra before going live

AWS Implementation Example

yaml
# AWS CodeDeploy appspec.yml for Blue-Green ECS deployment
version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: <TASK_DEFINITION>
        LoadBalancerInfo:
          ContainerName: "my-app"
          ContainerPort: 8080

# CodeDeploy creates a replacement task set (Green),
# shifts traffic when health checks pass,
# then terminates the original task set (Blue).
# Termination delay: 5 minutes (gives rollback window).
Hooks:
  AfterAllowTestTraffic:
    - Location: scripts/run-smoke-tests.sh
      Timeout: 300

On Kubernetes, blue-green is achieved by maintaining two `Deployment` objects with different labels. The `Service` selector is updated to point to the new deployment. Tools like Argo Rollouts and Flagger automate this pattern, including automatic rollback if health checks fail post-switch.

When to Use Blue-Green

  • High-risk releases where instant rollback capability is non-negotiable
  • Regulatory environments requiring zero-downtime maintenance windows
  • Infrequent, large batch releases (weekly or monthly) rather than continuous delivery
  • Stateless services with manageable database migration strategies
💡

Interview Tip

In interviews, when you propose blue-green deployment, immediately address the database problem — it's the follow-up every interviewer expects. Say: 'The tricky part is schema changes. I'd use the expand-contract pattern: first deploy a backward-compatible migration, then cut traffic, then run the cleanup migration after the old environment is decommissioned.' This shows depth beyond just knowing the pattern name.

📝

Knowledge Check

5 questions

Test your understanding of this lesson. Score 70% or higher to complete.

Ask about this lesson

Ask anything about Blue-Green Deployment