AWS Infrastructure as Code: Complete Guide to CloudFormation, CDK, and Terraform
Infrastructure as Code is the operating model that turns AWS from a collection of hand-built resources into a repeatable platform. The goal is not just faster provisioning. The goal is to make infrastructure changes reviewable, testable, reversible, and consistent across environments.
On AWS, most teams end up choosing between three primary approaches:
- CloudFormation for native AWS coverage and direct service integration
- AWS CDK for reusable infrastructure written in programming languages
- Terraform for a mature declarative workflow and multi-cloud portability
The best choice depends on team skills, governance requirements, existing estate, and how much abstraction the platform needs. This guide covers how to choose, how to structure an implementation, and how to avoid the problems that make infrastructure automation brittle.
What Infrastructure as Code Changes
Infrastructure as Code replaces console-driven infrastructure management with committed source files, peer review, automated validation, and controlled deployment pipelines. A VPC, IAM role, ECS service, Lambda function, RDS cluster, Route 53 record, or CloudWatch alarm should be represented in source control instead of only existing as state inside an AWS account.
That shift creates several operational advantages:
- Teams can review infrastructure changes before they reach production.
- Environments can be recreated from known source instead of tribal knowledge.
- Security controls can be applied consistently across accounts and regions.
- Drift becomes detectable instead of invisible.
- Rollbacks and disaster recovery plans become more concrete.
IaC does not remove the need for AWS expertise. It makes that expertise explicit and reusable.
CloudFormation: AWS-Native Infrastructure
CloudFormation is AWS’s native infrastructure provisioning engine. Every CloudFormation stack is managed directly by AWS, and new AWS services commonly support CloudFormation early in their lifecycle.
CloudFormation is a strong fit when:
- The estate is AWS-only.
- The team wants first-party AWS support.
- Change sets and stack events are important operational primitives.
- StackSets are needed for multi-account rollout.
- The organization prefers not to manage an external state engine.
A small CloudFormation stack can be straightforward:
AWSTemplateFormatVersion: "2010-09-09"
Description: Basic application security group
Parameters:
VpcId:
Type: AWS::EC2::VPC::Id
Environment:
Type: String
AllowedValues: [dev, staging, prod]
Resources:
AppSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: !Sub "${Environment} application security group"
VpcId: !Ref VpcId
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
Tags:
- Key: Environment
Value: !Ref Environment
- Key: ManagedBy
Value: CloudFormation
The challenge is scale. Large YAML templates can become difficult to test, refactor, and review. Nested stacks help, but they need clear ownership boundaries. If a team keeps adding unrelated resources to a single stack because it is convenient, CloudFormation can turn into a large procedural artifact instead of a clean platform model.
AWS CDK: Infrastructure With Reusable Constructs
AWS CDK synthesizes CloudFormation from languages such as TypeScript, Python, Java, C#, and Go. It is especially useful when infrastructure patterns need reusable abstractions.
CDK is a strong fit when:
- Application teams already work in a supported programming language.
- The platform needs reusable constructs across many services.
- Infrastructure tests should run with normal unit testing tools.
- The team wants higher-level AWS defaults without hand-writing every resource.
- CloudFormation remains the desired deployment engine.
A CDK stack can express the same intent with less boilerplate:
import * as cdk from "aws-cdk-lib";
import * as ec2 from "aws-cdk-lib/aws-ec2";
import { Construct } from "constructs";
export class NetworkStack extends cdk.Stack {
public readonly vpc: ec2.Vpc;
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
this.vpc = new ec2.Vpc(this, "Vpc", {
maxAzs: 2,
natGateways: 1,
subnetConfiguration: [
{ name: "Public", subnetType: ec2.SubnetType.PUBLIC, cidrMask: 24 },
{ name: "Private", subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS, cidrMask: 24 },
{ name: "Database", subnetType: ec2.SubnetType.PRIVATE_ISOLATED, cidrMask: 28 }
]
});
cdk.Tags.of(this).add("ManagedBy", "CDK");
}
}
CDK’s advantage is not only fewer lines. It lets a platform team publish internal constructs such as StandardVpc, PrivateService, EncryptedBucket, or AuditedQueue. Those constructs can encode tagging, encryption, logging, alarm, and access-control defaults.
The risk is over-abstraction. If a construct hides too much, application teams cannot understand what will be deployed. Keep constructs small, documented, and testable.
Terraform: Declarative Workflow and Multi-Cloud Reach
Terraform is widely used for AWS because it has a mature provider ecosystem, a clear plan/apply workflow, and a module model that many teams already understand.
Terraform is a strong fit when:
- The organization uses more than one cloud provider.
- Teams already have Terraform skills and module conventions.
- Infrastructure state should be managed outside CloudFormation.
- Third-party providers are part of the platform.
- A plan file review step is important to the deployment process.
A simple AWS VPC pattern in Terraform looks like this:
provider "aws" {
region = var.aws_region
}
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = "${var.project_name}-${var.environment}-vpc"
Environment = var.environment
ManagedBy = "Terraform"
}
}
resource "aws_subnet" "private" {
for_each = var.private_subnets
vpc_id = aws_vpc.main.id
cidr_block = each.value.cidr
availability_zone = each.value.az
tags = {
Name = "${var.project_name}-${var.environment}-${each.key}"
Environment = var.environment
Tier = "private"
}
}
Terraform’s main operational responsibility is state. Use remote state, lock state during updates, limit who can mutate state, and create backup procedures. Treat state as production data, because it is.
CloudFormation vs CDK vs Terraform
| Decision Area | CloudFormation | AWS CDK | Terraform |
|---|---|---|---|
| Deployment engine | AWS native | CloudFormation synthesis | Terraform state engine |
| Best for | AWS-only native stacks | Reusable AWS platform patterns | Multi-cloud and module workflows |
| Language | YAML or JSON | TypeScript, Python, Java, C#, Go | HCL |
| Testing model | Template validation and change sets | Unit tests, snapshot tests, synth validation | Validate, plan, policy checks |
| State model | AWS-managed stacks | AWS-managed stacks | Remote Terraform state |
| Main risk | Large hard-to-review templates | Over-abstracted constructs | State drift or state access mistakes |
There is no universal winner. Many mature AWS environments use more than one tool. A platform team might use CloudFormation StackSets for account baselines, CDK for application infrastructure, and Terraform for DNS, SaaS providers, or multi-cloud shared services.
The important rule is to avoid tool sprawl without ownership. Every IaC tool in the estate needs a clear purpose, code standard, state model, and deployment path.
Implementation Roadmap
Start with the resources that create the most operational pain, not necessarily the resources that are easiest to model.
Phase 1: Inventory and Ownership
Create an inventory of existing AWS resources by account, region, application, and owner. Tagging gaps should be fixed before or during migration. Infrastructure without an owner is difficult to migrate safely.
Useful inventory inputs include:
- AWS Config resource inventory
- Cost and Usage Report tags
- CloudFormation stack lists
- Terraform state files
- Load balancer target groups and listener rules
- Route 53 hosted zones
- IAM roles and policies
Phase 2: Foundation Stacks
Move stable shared infrastructure first:
- VPCs, subnets, route tables, NAT gateways, and endpoints
- IAM permission boundaries and deployment roles
- KMS keys and logging buckets
- Security groups with well-understood consumers
- AWS Config, CloudTrail, GuardDuty, and baseline alarms
These layers should change slowly and have stricter review rules than application infrastructure.
Phase 3: Application Infrastructure
After the foundation is stable, migrate application-owned resources:
- ECS services and task definitions
- Lambda functions and event sources
- RDS, DynamoDB, SQS, SNS, EventBridge, and S3 resources
- CloudWatch dashboards and alarms
- Route 53 records and certificates
Keep application stacks small enough that a service team can reason about them during review.
Phase 4: Policy, Testing, and Release Controls
IaC should be validated before deployment. A practical pipeline usually includes:
- Formatting checks
- Static validation
- Security scanning
- Policy checks for encryption, public access, and IAM scope
- Plan or change-set review
- Deployment to a non-production environment
- Post-deployment smoke checks
The exact tools vary, but the control points should be consistent.
Security and Governance
Infrastructure automation can make security better or worse. It makes good defaults repeatable, but it also lets a bad pattern spread quickly.
Set baseline rules early:
- No secrets in source code, variables, templates, or state outputs.
- Encryption is the default for storage, queues, databases, and logs.
- Public network paths require explicit review.
- IAM policies should be scoped to actions, resources, and conditions.
- Production deployments require a separate role from development deployments.
- CloudTrail and Config should cover every account and region in scope.
Use policy-as-code where possible. Examples include CloudFormation Guard, Checkov, tfsec, Open Policy Agent, IAM Access Analyzer, and AWS Config custom rules.
Testing Infrastructure Code
Infrastructure tests should catch expensive mistakes before AWS does.
For CloudFormation:
aws cloudformation validate-template --template-body file://template.yaml
aws cloudformation create-change-set \
--stack-name app-prod \
--change-set-name app-prod-review \
--template-body file://template.yaml
For CDK:
npm test
npx cdk synth
npx cdk diff
For Terraform:
terraform fmt -check
terraform validate
terraform plan -out tfplan
Testing should also include runtime validation. If a deployment creates an ALB route, call the route. If it creates an SQS event flow, publish a test event. If it changes IAM, verify the intended principal can perform the intended action and cannot perform adjacent actions.
Migration Patterns
Avoid one giant migration. Use one of these patterns instead.
Import Existing Resources
Import is useful when resources are already correct and need to be brought under IaC management. Use it carefully. Importing a resource does not automatically mean the code accurately describes every important setting.
Replace During Planned Change
Some resources are easier to recreate during a larger change window. For example, a non-critical queue, alarm, or dashboard may be simpler to replace than import.
Wrap Around Existing Infrastructure
Create IaC for new surrounding resources first, then migrate the older center of gravity later. This is common with DNS, observability, and deployment pipelines.
Blue-Green Infrastructure
For high-risk components, build the new infrastructure in parallel, shift traffic, validate behavior, and then retire the old stack. This is slower but safer.
Repository Structure
A workable repository layout keeps shared modules separate from environment configuration:
infrastructure/
modules/
network/
ecs-service/
rds-postgres/
environments/
dev/
staging/
prod/
policies/
scripts/
docs/
For CDK, keep constructs separated from deployed stacks:
packages/
constructs/
standard-vpc/
private-service/
apps/
platform-network/
billing-service/
reporting-service/
The layout matters less than the ownership. A repository should make it obvious who owns a stack, how it is deployed, and how to validate it.
Companion Implementation Repositories
The implementation examples for this guide are organized by tool:
Use them as starting points for module boundaries, naming conventions, review checklists, and deployment structure.
Practical Adoption Checklist
Use this checklist before treating IaC as production-ready:
- Every production resource has an owner and environment tag.
- Shared infrastructure and application infrastructure are in separate stacks or modules.
- Production state and deployment roles are locked down.
- Every change produces a reviewed plan, diff, or change set.
- Security scanning runs before deployment.
- Rollback steps are documented for each critical stack.
- Drift detection is scheduled.
- Post-deployment checks run automatically.
- Documentation explains how to create, update, and destroy non-production environments.
Related Daily DevOps Guides
- CloudFormation to CDK: A Practical Migration
- Infrastructure Automation: Terraform and OpenTofu Comparison
- AWS Infrastructure Code Testing
- AWS DevOps Automation
- AWS Multi-Account Security Architecture
Infrastructure as Code is not a one-time migration project. It is the foundation for operating AWS with discipline: small changes, clear ownership, repeatable deployment, and evidence that the system behaves the way the code says it should.