Organization Terraform State Buckets
Organization Terraform State Buckets
Terraform requires a backend to store state files. These are crucial to allow Terraform to know how to map real-world resources to code definition.
Over the years i’ve seen it done an awful lot of ways, and most of them tend to be less than ideal at some time or another. In this post i’ll share my preferred way of doing this and the rationale behind it.
The Problem
When you initialize a Terraform project, it needs to create the state file. When you;re first learning and haven’t provided a backend config, Terraform will just create a local file called terraform.tfstate in the same directory as your code. This is fine for learning, but it definitely wont be long until something goes wrong. either you accidentally delete the file and terraform is trying to depoy resources that conflict with things that already exist.
Often the next problem is multiple people start working on the code. Its just a text file and so teams sometimes think storing the statefile in Git might be a good idea, except that it isn’t. Terraform state files can contain sensitive information, and they also change frequently, which can lead to merge conflicts and other issues when stored in version control.
Thats when teams learn about remote back ends.
You can have a few different backend types, but i work almost exclusively with AWS, so the natural choice is S3. Create a bucket in your account, and then tell terraform to use it as a backend. For a lot of cases, this is fine, but you have to remember to configure everything correctly, for example make sure its not public, and make sure versioning is enabled. Its not a huge issue, but its one more thing to remember, and its something sat right in the account that should only be hosting the application.
When things get a little bigger, you create an AWS Organization. now you have multiple accounts, each with a state bucket, and you have to remember to configure each one correctly. Its not hard to create a simple CloudFormation StackSet that will create this in an account by default.
But I try my absolute best to stay away from CloudFormation unless i have no other choice. There is also the posibility an account wont need a state bucket, so now i need exclusion logic in my stack set and now we’re going down another messy, unmaintainable road….
The Solution
In a well built organization you essentially divide all accounts into a small number top-level Organization Units (OUs). You might have Core for centralized components, Shared for shared services, Workloads for applications and so on.
I tend to create an account that contains deployment related components, so i imaginatively call it Deployment. In here I create the One True State Bucket. This bucket is used by all accounts in the organization to store their Terraform state files. This way, I only have to configure one bucket, and I can ensure that it is configured correctly with versioning and encryption.
But we hit on a small problem. If all accounts are using the same bucket, how do we prevent conflicts? The answer is to use a bucket policy that controls how the bucket can be accessed.
The Rules
Org Only Access
The first rule is that only Principals from our organization can access the bucket. This is done by starting with a deny statement that denies all actions where the Principal (the requestor) has the Org ID we expect. For this we use a Global Condition Context Key: aws:PrincipalOrgID
This gives us:
{
"Sid": "DenyNonOrgPrincipals",
"Effect": "Deny",
"Principal": "*",
"Action": "*",
"Resource": ["arn:aws:s3:::YOUR-BUCKET-NAME", "arn:aws:s3:::YOUR-BUCKET-NAME/*"],
"Condition": {
"StringNotEquals": {
"aws:PrincipalOrgID": "o-1234567890"
}
}
}
Deployment Role Access Only
Next, we dont want just anyone to access this. You only want your deployment roles to be able to access this bucket. You are only allowing CI/CD to deploy aren’t you????
{
"Sid": "DenyNonDeploymentRoles",
"Effect": "Deny",
"Principal": "*",
"Action": ["s3:*"],
"Resource": ["arn:aws:s3:::YOUR-BUCKET-NAME/*"],
"Condition": {
"StringNotLike": {
"aws:PrincipalARN": "arn:aws:iam::*:role/Deploy-*"
}
}
}
This policy says that only roles that start with Deploy- can access the bucket. This way, we can ensure that only our deployment roles can access the state files, and we can prevent any accidental access from other roles. You can change that role name to whatever you want, but the point is that you should have a naming convention for your deployment roles, and then use that convention to control access to the bucket. Deployment roles are one of the few good use-cases for a CloudFormation StackSet in my opinion, so you can use that to create the roles in each account, and then use this policy to control access to the bucket.
Deployment Roles can Read Anything
Terraform has a concent of the remote-state block. This lets you read output from one stack and use it in another and its incredibly useful. For example, you might have a network stack that creates your VPC and subnets, and then you want to use the output of that stack in your application stack to create your EC2 instances. With the remote-state block, you can read the output of the network stack and use it in your application stack without having to hardcode any values.
Because of this, we want to be able to read from any state file, but again, only if its a depoyment role, so we can add this statement:
{
"Sid": "AllowDeploymentRolesRead",
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject"],
"Resource": ["arn:aws:s3:::YOUR-BUCKET-NAME/*"],
"Condition": {
"StringLike": {
"aws:PrincipalARN": "arn:aws:iam::*:role/Deploy-*"
}
}
}
This just ensures GetObject is allowed for deployment roles, but no other actions are allowed. This way, we can ensure that our deployment roles can read from any state file, but they cant write to any state file that they dont own.
Formatted Keys for Writing State
This final statement lets our roles write to the bucket, but to ensure one team cant accidentally over-write another teams state, we add some restrictions.
- The Key must identify the account that is trying to write to the state file. We can get this from the IAM variable
aws:PrincipalAccount. - The key must end with
.tfstateto try to enforce teams to use the standard naming convention. - The key must contain the project name - sometimes there are multiple projects or stacks deployed to one account
- It must identify the environment, e.g. Dev, Test, Prod etc.
So our key must look like:
/<account id>/<project name>/<dev|test|prod>/<stack name>.tfstate
This leaves us with this policy:
{
"Sid": "AllowDeploymentRolesWrite",
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:PutObject"],
"Resource": [
"arn:aws:s3:::YOUR-BUCKET-NAME/${aws:PrincipalAccount}/*/dev/*.tfstate",
"arn:aws:s3:::YOUR-BUCKET-NAME/${aws:PrincipalAccount}/*/test/*.tfstate",
"arn:aws:s3:::YOUR-BUCKET-NAME/${aws:PrincipalAccount}/*/prod/*.tfstate"
],
"Condition": {
"StringLike": {
"aws:PrincipalARN": "arn:aws:iam::*:role/Deploy-*"
}
}
}
The Full Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EnforceOrgOwnership",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": ["arn:aws:s3:::YOUR-BUCKET-NAME", "arn:aws:s3:::YOUR-BUCKET-NAME/*"],
"Condition": {
"StringNotEquals": {
"aws:PrincipalOrgID": "o-xxxxxxxxxx"
}
}
},
{
"Sid": "RestrictToDeployRoles",
"Effect": "Deny",
"Principal": "*",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::YOUR-BUCKET-NAME/*",
"Condition": {
"StringNotLike": {
"aws:PrincipalArn": "arn:aws:iam::*:role/Deploy-*"
}
}
},
{
"Sid": "AllowScopedWriteAccess",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": [
"arn:aws:s3:::YOUR-BUCKET-NAME/${aws:PrincipalAccount}/*/dev/*.tfstate",
"arn:aws:s3:::YOUR-BUCKET-NAME/${aws:PrincipalAccount}/*/test/*.tfstate",
"arn:aws:s3:::YOUR-BUCKET-NAME/${aws:PrincipalAccount}/*/prod/*.tfstate"
],
"Condition": {
"StringLike": {
"aws:PrincipalArn": "arn:aws:iam::*:role/Deploy-*"
}
}
},
{
"Sid": "AllowGlobalReadAccessForDeployRoles",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::YOUR-BUCKET-NAME/*",
"Condition": {
"StringLike": {
"aws:PrincipalArn": "arn:aws:iam::*:role/Deploy-*"
}
}
}
]
}
Conclusion
In this post, we have discussed how to organize Terraform state buckets in a way that is secure and maintainable. By using a single bucket for all accounts in the organization and controlling access with bucket policies, we can ensure that our state files are protected and that only authorized roles can access them. This approach simplifies management and reduces the risk of misconfiguration while still allowing for the necessary flexibility in accessing state files across different projects and environments.
This bucket can be backed up, and managed by a limited group of people, and you can be confident that your state files are safe and secure.