AWS CDK: Infrastructure That Actually Scales

Raw CloudFormation is YAML at scale. It works, but it doesn't compose. When two stacks need the same Lambda configuration, you copy and paste. When a security baseline changes, you update it in a dozen places. CDK solves the composition problem by treating infrastructure like code.

How CDK Works

CDK lets you define AWS infrastructure in a real programming language. TypeScript is the most common choice. You write classes, objects, and functions. CDK compiles that to CloudFormation templates.

The hierarchy is three levels deep:

App is the root. One CDK app can contain multiple stacks.

Stack maps directly to a CloudFormation stack. It's the unit of deployment. Each stack gets its own template, its own state, and its own deployment lifecycle.

Construct is the building block. Every resource in CDK is a construct. Constructs compose into stacks, and stacks compose into apps.

import * as cdk from 'aws-cdk-lib'
import { ApiStack } from './stacks/api-stack'
import { DataStack } from './stacks/data-stack'

const app = new cdk.App()
const env = { account: process.env.CDK_ACCOUNT, region: 'eu-central-1' }

const data = new DataStack(app, 'DataStack', { env })
new ApiStack(app, 'ApiStack', { env, table: data.table })

Running cdk synth produces the CloudFormation YAML. Running cdk deploy sends it to AWS. The YAML is generated; you never write it by hand.

Three Levels of Constructs

CDK ships with constructs at three abstraction levels.

L1 constructs (Cfn*) are direct mappings to CloudFormation resources. CfnBucket, CfnFunction, CfnTable. Every property maps 1:1 to a CloudFormation property. They're verbose and give you no defaults.

L2 constructs are the useful ones. s3.Bucket, lambda.Function, dynamodb.Table. They have sensible defaults, handle IAM permissions with methods like grantRead and grantWrite, and expose typed props instead of raw CloudFormation strings.

L3 constructs (called "patterns") combine multiple resources into a complete solution. ApplicationLoadBalancedFargateService from aws-cdk-lib/aws-ecs-patterns creates an ECS cluster, a Fargate service, a load balancer, and the wiring between them in one shot.

Most day-to-day CDK code lives at L2. L3 patterns are useful when they match your use case exactly.

Constructs Scale Teams

This is where CDK earns its value over Terraform or raw CloudFormation.

A construct is just a class. You can write your own, pass configuration through the constructor, and reuse it across stacks and projects. That composition model is what YAML cannot provide.

Consider a Lambda function that needs a dead-letter queue for failed invocations and a CloudWatch alarm when that queue receives messages. In CloudFormation, you write that configuration in full wherever you need it. In CDK, you write it once:

import * as cdk from 'aws-cdk-lib'
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as sqs from 'aws-cdk-lib/aws-sqs'
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch'
import { Construct } from 'constructs'

interface ReliableFunctionProps {
  handler: string
  code: lambda.Code
  environment?: Record<string, string>
}

export class ReliableFunction extends Construct {
  readonly fn: lambda.Function

  constructor(scope: Construct, id: string, props: ReliableFunctionProps) {
    super(scope, id)

    const dlq = new sqs.Queue(this, 'DLQ', {
      retentionPeriod: cdk.Duration.days(14),
    })

    this.fn = new lambda.Function(this, 'Function', {
      runtime: lambda.Runtime.NODEJS_22_X,
      handler: props.handler,
      code: props.code,
      deadLetterQueue: dlq,
      environment: props.environment,
    })

    new cloudwatch.Alarm(this, 'DLQAlarm', {
      metric: dlq.metricNumberOfMessagesSent(),
      threshold: 1,
      evaluationPeriods: 1,
    })
  }
}

Every Lambda in the system can use this construct. The DLQ and alarm come for free. When the retention period or alarm threshold needs to change, it changes in one place.

Shared constructs can live in an internal npm package. Multiple teams consume it. The platform team publishes updates; product teams get them on the next install. Infrastructure conventions stop being documentation that drifts and become code that enforces itself.

Stack Boundaries Matter

Splitting infrastructure into multiple stacks is not just organization. It affects deployment speed, blast radius, and access control.

A single monolithic stack becomes slow to deploy as it grows. CloudFormation diffs the entire template on every change. With separate stacks, a change to the API stack doesn't touch the data stack. Deployments are faster and only affect what actually changed.

Blast radius shrinks. A failed deployment in the messaging stack doesn't roll back database configuration.

CDK handles cross-stack references cleanly. When a stack needs a resource from another stack, you pass it through the constructor:

const data = new DataStack(app, 'DataStack', { env })

// data.table is a dynamodb.Table construct
// CDK generates the CloudFormation export/import automatically
new ApiStack(app, 'ApiStack', {
  env,
  table: data.table,
})

CDK generates the Outputs and Fn::ImportValue references in the synthesized templates. You work with typed objects; the CloudFormation wiring happens behind the scenes.

Testing Infrastructure

CDK stacks are code, so they're testable.

aws-cdk-lib/assertions lets you assert properties on the synthesized CloudFormation template without deploying anything:

import * as cdk from 'aws-cdk-lib'
import { Template } from 'aws-cdk-lib/assertions'
import { DataStack } from '../lib/stacks/data-stack'

test('DynamoDB table has point-in-time recovery enabled', () => {
  const app = new cdk.App()
  const stack = new DataStack(app, 'TestStack', { env })
  const template = Template.fromStack(stack)

  template.hasResourceProperties('AWS::DynamoDB::Table', {
    PointInTimeRecoverySpecification: {
      PointInTimeRecoveryEnabled: true,
    },
  })
})

This catches configuration mistakes before they reach AWS. No API calls, no deployment. The test runs against the synthesized JSON in milliseconds.

You can assert that IAM policies don't grant * actions, that S3 buckets have versioning enabled, that Lambda functions have reserved concurrency set. Compliance rules become assertions in a test suite that runs in CI.

Multiple Environments

Spinning up a new environment in CloudFormation means duplicating templates or wrestling with parameter files. In CDK, an environment is a function call.

The standard pattern is to pass an environment identifier through stack props and let each stack adjust its configuration accordingly:

type Stage = 'int' | 'stage' | 'prod'

interface StackProps extends cdk.StackProps {
  stage: Stage
}

const stage = (app.node.tryGetContext('stage') ?? 'int') as Stage
const env = { account: process.env.CDK_ACCOUNT, region: 'eu-central-1' }

const data = new DataStack(app, `DataStack-${stage}`, { env, stage })
new ApiStack(app, `ApiStack-${stage}`, { env, stage, table: data.table })

Inside each stack, the stage prop drives environment-specific decisions. Integration environments can skip expensive features; production gets the full configuration:

new dynamodb.Table(this, 'Table', {
  billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
  pointInTimeRecovery: props.stage === 'prod',
  removalPolicy: props.stage === 'prod'
    ? cdk.RemovalPolicy.RETAIN
    : cdk.RemovalPolicy.DESTROY,
})

Deploying to a new environment is one command:

cdk deploy --all --context stage=stage

The stack names include the stage suffix, so all environments coexist in the same AWS account without colliding. Tearing down an integration environment is equally simple:

cdk destroy --all --context stage=int

This model is cheap to operate. Integration environments can be created for a feature branch, verified, and destroyed when the branch merges. No manual cleanup, no orphaned resources.

Multi-Region Deployments

Cross-regional high availability follows the same pattern as multiple environments. The CDK env object takes both an account and a region. Instantiating the same stack twice with different regions deploys identical infrastructure in both.

const regions = ['eu-central-1', 'us-east-1']

for (const region of regions) {
  const env = { account: process.env.CDK_ACCOUNT, region }

  const data = new DataStack(app, `DataStack-${region}`, { env, stage })
  new ApiStack(app, `ApiStack-${region}`, { env, stage, table: data.table })
}

One cdk deploy --all rolls out to every region in the list. Adding a region means adding one entry to the array.

The split between stateless and stateful stacks matters here. Stateless stacks (API Gateway, Lambda, compute) replicate cleanly across regions because they carry no data. Stateful stacks need a replication strategy.

For DynamoDB, the answer is Global Tables. You define the table once and declare the replica regions:

new dynamodb.Table(this, 'Table', {
  billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
  replicationRegions: ['eu-central-1', 'us-east-1'],
})

Global Tables handle bi-directional replication automatically. Writes in Frankfurt are visible in Virginia within milliseconds.

Traffic routing sits outside CDK but completes the picture. Route 53 latency-based routing or health check failover directs users to the nearest healthy region. The CDK stacks provision the infrastructure in each region; Route 53 decides which one handles each request.

The result is a system where a full regional outage has no user-visible impact. The other region absorbs traffic, Global Tables keep data consistent, and the only manual step is whatever triggered the failover in the first place.

CDK and CI/CD

CDK on its own is powerful. Paired with a CI/CD system, it becomes the backbone of a reliable deployment process.

The workflow is straightforward: a push to a branch triggers the pipeline, the pipeline runs cdk synth to validate the templates, runs the test suite against the synthesized output, and then deploys to the target environment.

A GitHub Actions setup looks like this:

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
        with:
          version: 10
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: pnpm

      - run: pnpm install --frozen-lockfile
      - run: pnpm exec cdk synth --context stage=${{ env.STAGE }}
      - run: pnpm test
      - run: pnpm exec cdk deploy --all --require-approval never --context stage=${{ env.STAGE }}
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

--frozen-lockfile refuses to install if the lockfile doesn't match package.json, so CI always runs against exactly what was committed. Combined with pnpm's content-addressed cache, installs on warm runners are fast. --require-approval never disables the interactive confirmation prompt, which is appropriate in CI where no human is watching. The equivalent safety comes from the test suite and the synth step catching problems before deploy runs.

If you're not using pnpm yet, it's worth the switch. The lockfile integrity and supply chain protections pair well with infrastructure pipelines where reproducibility matters. See Why pnpm Is the Better Package Manager.

A common pattern is to map branches to environments: merges to develop deploy to int, merges to main deploy to stage, and production requires a manual trigger or a tag. Every environment goes through the same pipeline. There's no "deploy to prod differently" path that bypasses validation.

At HDNET, the majority of AWS projects are defined and deployed this way. CDK handles the infrastructure definition; the CI/CD pipeline handles promotion across environments. Changes to infrastructure go through the same review and validation process as application code changes. Nothing reaches production that hasn't been synthesized, tested, and deployed to a lower environment first.

At HDNET

Most of the AWS work at HDNET is built on CDK. Multiple stacks per project, shared constructs for cross-cutting concerns like observability and security configuration, and environments managed through context variables.

The pattern holds across projects regardless of architecture. An event-driven system with SNS and SQS uses the same CDK model as a more straightforward API and database setup. The stack boundary decisions change based on the service boundaries of each system, but the composition model stays the same.

Onboarding a developer to a CDK project takes less time than explaining a set of CloudFormation templates. The infrastructure is navigable. Each stack owns a clear slice of the system, and the TypeScript types make the relationships between stacks explicit without reading documentation.

CloudFormation is not the problem. The problem is maintaining CloudFormation at scale without a composition model. CDK provides that model. Infrastructure becomes code in the same sense that application code is code: typed, testable, reusable, and version-controlled.