Terraform6 min read

Terraform State Management at Scale

Best practices for remote state, workspaces, and state locking when managing infrastructure across multiple environments.

2024-11-20

Terraform State Management at Scale

Terraform's state file is the source of truth for your infrastructure. When you're managing a single project, the default local state works fine. But as soon as you have multiple environments, team members, or CI/CD pipelines, state management becomes the most critical aspect of your Terraform workflow.

Remote State with Azure Storage

For Azure-based projects, I store Terraform state in Azure Blob Storage with versioning and soft delete enabled:

terraform {  backend "azurerm" {    resource_group_name  = "rg-terraform-state"    storage_account_name = "stterraformstate"    container_name       = "tfstate"    key                  = "prod/networking.tfstate"  }}

The storage account should be provisioned outside of Terraform (via a bootstrap script) and configured with:

#!/bin/bash
az storage account create \
  --name stterraformstate \
  --resource-group rg-terraform-state \
  --sku Standard_GRS \
  --encryption-services blob \
  --allow-blob-public-access false

az storage account blob-service-properties update \
  --account-name stterraformstate \
  --enable-versioning true \
  --enable-delete-retention true \
  --delete-retention-days 30

Geo-redundant storage (GRS) ensures your state survives regional outages. Versioning lets you recover from accidental state corruption.

State Locking

Azure Blob Storage provides native lease-based locking. When one terraform apply is running, another attempt will fail with a lock error rather than corrupting state. This is automatic with the azurerm backend — no additional configuration needed.

If a lock gets stuck (e.g., a CI runner crashed mid-apply), you can force-unlock:

terraform force-unlock LOCK_ID

But use this carefully — make sure no other operation is genuinely running.

Structuring State Files

One large state file for everything is an anti-pattern. I structure state by layer and environment:

environments/
  prod/
    networking/    # VNets, subnets, NSGs, peerings
    compute/       # VMs, VMSS, AKS clusters
    data/          # Databases, storage, caches
    monitoring/    # Log Analytics, alerts, dashboards
  staging/
    networking/
    compute/
    data/

Each directory has its own state file. This provides:

Blast radius reduction: A bad apply in compute/ can't destroy your network
Faster operations: terraform plan runs in seconds instead of minutes
Independent team workflows: Network team and app team don't block each other

Cross-State References with Data Sources

When state files are split, you need to share outputs between them. Use terraform_remote_state data sources:

data "terraform_remote_state" "networking" {  backend = "azurerm"  config = {    resource_group_name  = "rg-terraform-state"    storage_account_name = "stterraformstate"    container_name       = "tfstate"    key                  = "prod/networking.tfstate"  }} resource "azurerm_kubernetes_cluster" "aks" {  name                = "aks-prod"  location            = data.terraform_remote_state.networking.outputs.location  resource_group_name = data.terraform_remote_state.networking.outputs.resource_group_name   default_node_pool {    vnet_subnet_id = data.terraform_remote_state.networking.outputs.aks_subnet_id  }}

State File Hygiene

Regularly audit your state with terraform state list and clean up orphaned resources. Use terraform import to bring manually-created resources under management, and terraform state rm to remove resources that are now managed elsewhere.

A well-managed state file is the foundation of reliable infrastructure-as-code. Invest the time upfront to get it right.

Share this article

LinkedIn Twitter