Infrastructure as Code
Manage infrastructure through code: Terraform, CloudFormation, declarative vs imperative, state management, and drift detection.
Why Infrastructure as Code?
Before IaC, infrastructure was provisioned by logging into a cloud console and clicking through forms — 'ClickOps'. This approach is error-prone, unaudited, and impossible to reproduce consistently. Infrastructure as Code (IaC) means defining your cloud resources (VPCs, load balancers, databases, Kubernetes clusters) in text files checked into version control. You get the same benefits as application code: code review, history, rollback, repeatability, and automation.
Declarative vs Imperative
| Approach | Description | Examples | You specify |
|---|---|---|---|
| Declarative | Describe the desired end state | Terraform, CloudFormation, Pulumi | What you want |
| Imperative | Describe the sequence of steps to get there | Ansible, Chef, Puppet scripts | How to get there |
Declarative IaC is preferred for provisioning cloud resources because the tool handles idempotency — run `terraform apply` ten times and you get the same result. Imperative tools shine for configuration management (installing packages, configuring services on existing servers).
Terraform Core Concepts
# Configure the AWS provider
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
# Remote state in S3 with DynamoDB locking
backend "s3" {
bucket = "my-tf-state"
key = "prod/main.tfstate"
region = "us-east-1"
dynamodb_table = "tf-state-lock"
encrypt = true
}
}
# Variables for reusability
variable "environment" {
type = string
default = "production"
}
# Data source: look up existing resources
data "aws_vpc" "main" {
tags = { Name = "main-vpc" }
}
# Resource: define what to create
resource "aws_security_group" "api" {
name = "${var.environment}-api-sg"
vpc_id = data.aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
# Output: expose values for other modules
output "api_sg_id" {
value = aws_security_group.api.id
}Terraform Workflow
State Management
Terraform's state file (`terraform.tfstate`) tracks which real cloud resources correspond to which declared resources. It must be stored in a remote backend (S3, GCS, Terraform Cloud) and protected with state locking (DynamoDB table in AWS) to prevent concurrent modifications from corrupting it. The state file may contain sensitive values — enable server-side encryption and restrict access via IAM.
Never edit the state file manually
Manual edits to `terraform.tfstate` can corrupt your infrastructure mapping. Use `terraform state mv` to rename resources, `terraform import` to bring existing resources under management, and `terraform state rm` to remove resources from tracking without destroying them.
Drift Detection
Drift occurs when someone changes a resource outside of Terraform (e.g., manually adjusting a security group rule in the AWS console). Running `terraform plan` in CI on a schedule (or before every apply) detects drift by comparing live infrastructure against the state file. Some teams use Terraform Sentinel or OPA (Open Policy Agent) policies to enforce governance rules (e.g., 'all S3 buckets must have encryption enabled') in the plan phase.
Interview Tip
Interviewers from platform/SRE roles often ask about IaC. Key points to hit: (1) store state remotely with locking, (2) never commit secrets — use environment variables or a secrets manager, (3) modularize Terraform with reusable modules (e.g., a 'vpc' module, an 'rds' module), (4) run `terraform plan` in CI and require plan review before apply, (5) tag all resources for cost attribution.