Terraform CI/CD

July 29, 2019

Terraform helps in creating cross-cloud immutable infrastructure with code. As with most code there is an ideal of having your codebase automated with tests and deploys. How can this work with terraform? Does it work in a sensible way to give you confidence in your changes and how they ultimately get applied? Trying to answer these questions led to the following approach that I’ve used to attempt to try having a CI/CD solution for a terrorm codebase.

Setup

Installing terraform can be found in the Terraform install guide
An AWS account (can be replaced with other providers)
Some prior knowledge or use of Terraform is helpful since everything won’t be covered

Initializing

Terraform is built with the notion of modules that define your resources. Each directory is a module to Terraform. Start out by creating a directory for your Terraform module

mkdir terraform-ci
cd terraform-ci
git init

Next define some resources and the AWS provider in the main.tf file of our module.

provider "aws" {
  profile    = "default"
  region     = "us-east-1"
}

resource "aws_instance" "test-instance" {
  ami           = "ami-2757f631"
  instance_type = "t2.micro"
}

Now you can run terraform init which will set up the state and providers for the module we will be working with.

Workspaces

Workspaces are a concept in Terraform that help you manage state across multiple namespaces. The namespaces could be a way to separate your state by production and staging and qa for example. Since each directory is a module in Terraform the state for your production and staging infrastructure can’t use the same state. Rather than making a module per stage and duplicating all your resources per module directory you can use workspaces to manage things more cleanly.

There is a default workspace that is used without needing to do anything. In this approach we want a staging and production workspace. To do that we can use the CLI to create them

# Create staging workspace
terraform workspace new staging

# Create production workspace
terraform workspace new production

Once they are created you can view all the workspaces with terraform workspace list. Last step is to select the workspace you want to be working with you have to explicitly do so with the CLI:

# Selecting the staging workspace
terraform workspace select staging

Splitting up our state by staging and production is useful for getting parity across environments. It is also a pattern we will want to mimic with our git branches and development flow to make the process match more semantically.

Git Setup

By default when you use git for version control the default branch is the master branch. Similar to the Terraform workspaces, we want to create a non-default flow and have branches for staging and production.

git co -b staging
git push -u origin staging

git co -b production
git push -u origin production

If you use Github you can change your default branch to be that of the staging branch. This will be where we work on all the features and new development that gets deployed and tested against first. This is to ensure that the production code and branch are always in a known and well vetted state. When you branch it will be from the staging branch and when you want to merge new changes in it will be into the staging branch rather than the default master branch.

Continuous Integration (CI)

A continuous integration approach for Terraform could entail a handful of different things. On one hand you could go a fully integrated route and use tools like Terratest to spin up all your defined resources. While this offers you a lot of confidence in your Terraform infrastructure being applied as you expect, you may find this a cost prohibitive approach. A more streamlined way to achieve some confidence in what ends up being applied is to leverage these workspaces to test them in your non-production environments first.

Terraform has two main actions that you will want to ensure run cleanly and those are plan and apply. The plan step goes through your defined resources and builds up the state of everything to ensure it can attempt to executed. Apply does just that, it will apply the known state at the point it is ran. For our CI approach we want to be able to verify these will run.

If you use a CI provider like CircleCI, or similar, you can set up the integration to run your CI pipeline when you open a pull request for new changes into our staging branch. The main concerns of the pipeline will be to validate the terraform files, plan the terraform state out, and apply that state. An example workflow for that might look like so:

base_image: &base_image
              hashicorp/terraform:latest

working_directory: &working_directory
                     ~/terraform

default_config: &default_config
  docker:
  - image: *base_image
  working_directory: *working_directory

terraform_init: &terraform_init
  run:
    name: initialize
    command: terraform init

set_terraform_workspace: &set_terraform_workspace
  run:
    name: set terraform environment
    command: |
      if [ "${CIRCLE_BRANCH}" == "production" ]; then
        terraform workspace select production
      else
        terraform workspace select staging
      fi      

version: 2.1

#
# CI Jobs
#
jobs:
  build:
    <<: *default_config
    steps:
    - checkout
    - *terraform_init
    - *set_terraform_workspace
    - persist_to_workspace:
        root: *working_directory
        paths:
        - .terraform

  verify:
    <<: *default_config
    steps:
    - checkout
    - attach_workspace:
        at: *working_directory
    - attach_workspace:
        at: ~/
    - run:
        name: verify
        command: terraform verify

  plan:
    <<: *default_config
    steps:
    - checkout
    - attach_workspace:
        at: *working_directory
    - attach_workspace:
        at: ~/
    - run:
        name: plan
        command: terraform plan -out=terraform.plan
    - persist_to_workspace:
        root: *working_directory
        paths:
        - terraform.plan

  apply:
    <<: *default_config
    steps:
    - checkout
    - attach_workspace:
        at: *working_directory
    - attach_workspace:
        at: ~/
    - run:
        name: apply
        command: terraform apply -auto-approve terraform.plan

#
# CI Workflows
#
workflows:
  version: 2
  update_infrastructure:
    jobs:
      - build:
      - verify:
          requires:
            - build
      - plan:
          requires:
            - verify
            - build
      - apply:
          requires:
            - plan
          filters:
            branches:
              only:
                - staging
                - production

This is a lot to un-pack but lets go over it with the continuous delivery part of the CI pipeline.

Continuous Delivery (CD)

In the previous CI section there is an example CI workflow that walks through a lot of things. First it is using the [Hashicorp Terraform Docker image][terrform-image] to execute all the commands and jobs in. The first job is called build and this job is making sure that the terraform init command can run without issue and it saves the .terraform directory to be used with subsequent jobs so that the terraform state and providers is all set up to run commands.

Once Terraform is initialized we make sure the verify command runs successfully. Each of the jobs will need to make sure the correct workspace is selected. If you look at the *set_terraform_workspace anchor in YAML you can see that for any non-production branch you run through CI it will be running against the staging workspace. From the production branch it will apply everything against your production infrastructure.

After verify is successfull we finally run our terraform plan command. One thing of note here is we are utilizing the -out flag to print out our plan at the time it is ran. This allows us to knowingly plan the changes that are at the current point of the code and time the plan command was ran. This plan will be saved in the CI workflow to be used by the next job, apply.

The last job to be ran will require that we have a valid plan generated so that we can run apply against that plan. The plan generated from the plan job is used as an argument to terraform apply. One other thing here is that we are passing the -auto-approve flag so that there is no prompt in CI to say yes to applying the changes.

Feature Branches

If you are running feature branches and creating pull requests to merge into the mainline branch, staging in this case, then you will want to make sure that the apply job doesn’t run in CI. To avoid multiple changesets in feature branches wiping out the state on competing branches it makes sense to verify the plan step runs and the code can be initialized and is verified. CircleCI uses filters to achieve that on the workflow jobs.

Production

Once you have run applied your changes against staging, verified they work as expected, then you can start the process to deploy your changes out to production. For those who want to use pull requests in Github you can make a new pull request from your staging branch into your production branch. This will trigger a new CI build that will run all the previously explained jobs but with the production workspace selected.

The plan output will show all the new changes to apply in your production infrastructure and if your CI pipeline passes then it has gone out and been applied.

Caveats

Some things to look out for:

Using providers like AWS will need credentials in your CI pipeline and terraform vars
State should be saved remotely ([See docs][remote-state]). Which means your terraform version in CI will need to be consistent. If it changes because someone ran it locally with a different version it will have conflicts.
Resources need to be named with the workspace in mind to avoid naming conflicts

e.g name = "my-${terraform.workspace}-ec2-instance"

Permissions for your providers, like AWS, will likely need to have full access which can be a security concern to run from CI instances you don’t have full control over.
Terraform can apply things and fail and leave your infrastructure in a half applied state.
Extra branches for each environment/stage can be cumbersome at times.

Conclusion

Overall I’ve been running something very similar the last year or so to achieve a CI/CD approach with Terraform. It has some drawbacks but all in all it has helped give some clarity to what state our Terraform workspaces are in. Also, we have been able to iterate as a team with deployments happening continuously for staging while holding extra care and caution before things get to production.

Extra Credit

Use Terratest with multiple test accounts hooked up to provision cleanly from CI
If using AWS, or other providers that allow MFA, hook up MFA to be required for your permissions
- Will require extra CI steps
Add a manual approval step in CI to ensure someone looks at the plan that is generated before it applies
- Can be easy to forget CI jobs are waiting
Build your own CI docker image to use with all the extra tooling you need