examlab .net The most efficient path to the most valuable certifications.
In this note ≈ 18 min

Advanced Terraform for GCP

3,600 words · ≈ 18 min read ·

Professional Cloud Architect guide to Infrastructure as Code (IaC) using Terraform, focusing on the Google Cloud provider, state management, and best practices.

Do 20 practice questions → Free · No signup · PCA

Introduction to Terraform on GCP

For a Professional Cloud Architect, Terraform is the industry-standard tool for Infrastructure as Code (IaC). It allows you to define your Google Cloud resources in a declarative language (HCL) and manage them through a version-controlled workflow.

The Google Provider is the bridge between Terraform and the Google Cloud APIs.

For PCA scenarios that require repeatable Landing Zone provisioning across hundreds of GCP projects, the canonical answer is Terraform with the hashicorp/google provider, a remote GCS backend for state, and Cloud Foundation Toolkit (CFT) modules. Console-based or gcloud-script answers are wrong for enterprise-scale, auditable provisioning.


白話文解釋(Plain English Explanation)

Analogy 1 — IKEA Instructions vs Hand-Built Furniture

The google provider is like an IKEA assembly manual for GCP. The manual (HCL) declares the final shape of the bookshelf (google_compute_network, google_storage_bucket); the Allen wrench (provider plugin) translates each step into the right action against GCP APIs. If you hand-build furniture with gcloud scripts, every chair comes out slightly different — IaC guarantees that chair #500 looks identical to chair #1.

Analogy 2 — Two-Key Safe Deposit Box (State + Locking)

The Terraform state file on a GCS backend plus state locking is like a safe deposit box at a bank that needs two keys: the bank's key (the GCS object lock) and your key (your credentials). Two clerks (two terraform apply runs) cannot open the box at the same time — one must wait. Without locking, both clerks try to update the ledger simultaneously and you end up with two contradictory records of who owns which google_compute_instance.

Analogy 3 — Lego Master Sets (CFT Modules)

The Cloud Foundation Toolkit (CFT) Project Factory and IAM modules are like certified Lego master sets published by Google. Instead of you designing the spaceship from scratch (writing google_project + google_billing_account_iam_member + API enablement by hand), you snap together a Project Factory module that already includes the wings, cockpit, and safety stickers — the Shared VPC attachment, budget alerts, and default IAM bindings are built in.


Core Terraform Concepts

  1. HCL (HashiCorp Configuration Language): The human-readable language used to write Terraform files (.tf).
  2. State: A file (terraform.tfstate) that maps your code to the real-world resources in GCP.
  3. Plan: A preview of the changes Terraform will make to your infrastructure.
  4. Apply: The process of executing the plan and making the changes in GCP.

Plain-Language Analogies for Terraform

Analogy 1 — The Architectural Blueprint (IaC)

Imagine building a skyscraper. Terraform is the Blueprint. Instead of telling workers, "Put a brick here, then put a window there" (which is what a script/CLI does), you hand them a drawing that says "This building should have 50 floors and 200 windows." The workers (Terraform) look at the drawing, see what’s already there, and add exactly what is missing to match the drawing.

Analogy 2 — The Receipt and the Inventory (State Management)

The Terraform State file is like a Detailed Receipt from a grocery store combined with an Inventory List. If you lose the receipt, the store (GCP) still has the items, but you don't know exactly what you bought or where you put it. The state file tells Terraform: "Last time we checked, this specific ID in GCP belonged to this specific resource in your code."

Analogy 3 — Lego Blocks (Modules)

Terraform Modules are like Pre-built Lego Sets. Instead of building a "Car" from 500 tiny individual pieces every time, you create a "Car Module." Now, whenever you need a car, you just say module "my_car" { color = "red" }. This makes your infrastructure consistent and easy to scale.


The Google Cloud Provider

The provider handles the authentication and API calls to GCP.

provider "google" {
  project = "my-project-id"
  region  = "us-central1"
}
  • google provider: For general availability (GA) resources.
  • google-beta provider: For resources or features that are still in Beta. You can use both in the same project.

State Management Best Practices

This is a critical topic for the PCA exam.

  1. Remote State: Never keep your state file on your local laptop. Store it in a Cloud Storage Bucket.
    • Enable Object Versioning on the bucket to recover from accidental state corruption.
  2. State Locking: Use a backend that supports locking (like GCS) to prevent two people from running terraform apply at the same time and corrupting the state.
  3. Sensitive Data: Remember that the state file contains sensitive data (like database passwords) in plain text. Secure the GCS bucket with IAM and encryption.

Advanced Terraform Patterns

1. Workspaces

Use workspaces to manage different environments (Dev, Prod) using the same code. However, for large enterprise GCP environments, many architects prefer separate directories per environment for better isolation.

2. Output Variables

Use output to export information (like an IP address or a Service Account email) that can be used by other parts of your infrastructure or shared with other teams.

3. Resource Targeting

If you need to fix a specific resource without touching the rest of the stack: terraform apply -target=google_compute_instance.my_vm. Use this sparingly as it can lead to inconsistent state.

::promoted

Architect's Insight: Always run terraform plan and review the output before running apply. This is your "Seatbelt" that prevents accidental deletions of production databases. ::


Provider — the Terraform plugin that translates HCL resource declarations into authenticated API calls against a target platform. For GCP, the two relevant providers are hashicorp/google (GA) and hashicorp/google-beta (preview surfaces). A single root module can declare both and route specific resources via the provider = google-beta meta-argument.

google vs google-beta Provider

The HashiCorp registry publishes two distinct providers for GCP, and on the PCA exam you must know when to reach for each:

  • hashicorp/google (GA provider): Maps to GA REST endpoints (e.g. compute.googleapis.com/v1, container.googleapis.com/v1). Backed by HashiCorp's standard SLA and shipped on a roughly weekly release cadence.
  • hashicorp/google-beta: Exposes resources and arguments that live on Beta GCP APIs (e.g. container.googleapis.com/v1beta1). New features like GKE Autopilot DNS-based endpoints, certain Cloud Run preview flags, and AlloyDB preview features land here first.

You declare both in the same root module and pass them per-resource:

terraform {
  required_providers {
    google      = { source = "hashicorp/google",      version = "~> 5.0" }
    google-beta = { source = "hashicorp/google-beta", version = "~> 5.0" }
  }
}

resource "google_container_cluster" "preview" {
  provider = google-beta   # opt this cluster into Beta features
  name     = "edge-cluster"
  # ...
}

Why this matters for architects

  1. Resource drift on upgrade. A Beta argument that graduates to GA may be renamed; flipping provider = google-betaprovider = google can force replacement. Pin versions explicitly.
  2. Mixing in one project. It is supported and common to manage 95% of resources with google and a handful of cutting-edge resources with google-beta.
  3. Authentication is shared. Both providers consume the same Application Default Credentials and the same project / region arguments.

Common trap: copying a Beta-only argument (e.g. enable_l4_ilb_subsetting when it was still Beta) into a provider = google resource block. Terraform will accept the syntax but the GA API rejects the field at apply time. Always check the registry doc page header — it tells you "Beta" vs GA.


Resource Lifecycle Meta-Arguments

Every Terraform resource accepts a lifecycle block that overrides default plan/apply behaviour. PCA scenarios about zero-downtime cutovers and protected production resources hinge on these.

create_before_destroy

Default order is destroy-then-create. For resources like google_compute_instance_template attached to a google_compute_instance_group_manager, this causes an outage. Setting:

lifecycle {
  create_before_destroy = true
}

instructs Terraform to provision the new template first, swap the MIG to point at it, then destroy the old template — preserving the rolling update.

prevent_destroy

A guardrail for stateful resources:

resource "google_sql_database_instance" "prod" {
  # ...
  lifecycle { prevent_destroy = true }
}

Any plan that would destroy this google_sql_database_instance (including a terraform destroy or an upstream change forcing replacement) fails at plan time. You must remove the flag in a separate commit before the resource can be deleted — perfect for production Cloud SQL and GCS buckets holding compliance data.

ignore_changes

Use when an external system mutates a resource attribute that Terraform should not "correct":

resource "google_compute_instance" "vm" {
  lifecycle {
    ignore_changes = [metadata["ssh-keys"], labels["last-deployed-at"]]
  }
}

Common GCP examples: OS Login key rotation, autoscaler-managed target_size on a MIG, or labels patched by Cloud Asset Inventory automation.

replace_triggered_by (Terraform 1.2+)

Forces replacement of resource A when resource B changes — useful for tying a google_compute_instance to the hash of its startup-script google_storage_bucket_object.


Remote State on GCS with Locking

The recommended backend for GCP Terraform is gcs:

terraform {
  backend "gcs" {
    bucket  = "tf-state-prod-acme-co"
    prefix  = "platform/network"
  }
}

What you get out of the box

  • Strong consistency — Cloud Storage provides read-after-write consistency, so plan/apply always reads the latest state.
  • Native state locking — as of provider v4+, the gcs backend uses Cloud Storage object generation conditions to implement locking; no separate Firestore/DynamoDB equivalent is needed (unlike S3, which requires DynamoDB). Some legacy designs paired GCS with a Firestore document for lock metadata, but modern setups rely on the built-in mechanism.
  • Object Versioning — enable on the state bucket so a corrupted state can be rolled back to a previous generation via gsutil cp gs://bucket/path#<generation> ..
  • CMEK encryption — set kms_key_name on the bucket to encrypt state with Cloud KMS, satisfying compliance regimes that disallow Google-managed keys.

Hardening checklist

  1. Dedicated state project separate from workload projects.
  2. Uniform bucket-level access + IAM granting roles/storage.objectAdmin only to the CI/CD service account.
  3. VPC Service Controls perimeter around the state bucket to prevent exfiltration of state (which contains secrets).
  4. Audit logs (Data Access) enabled on the bucket — every terraform plan shows up as an objects.get.

The Terraform state file contains plaintext secrets — database passwords from google_sql_user, service account keys, Secret Manager versions. Treating the state bucket as a Tier-0 asset (CMEK + VPC SC + restrictive IAM) is non-negotiable for regulated workloads.


Cloud Foundation Toolkit (CFT) Modules

The Cloud Foundation Toolkit is Google's open-source collection of Terraform modules published under github.com/terraform-google-modules. Using CFT is the difference between a hand-rolled Landing Zone that drifts in 6 months and a opinionated, supported baseline.

Modules you must know for PCA

  • terraform-google-modules/project-factory — creates a google_project, links billing, enables APIs, attaches the project to Shared VPC, creates a default service account, and applies budget alerts in a single module call.
  • terraform-google-modules/iam — manages IAM bindings at org / folder / project / bucket / service-account scope with safe additive semantics (avoiding the iam_binding authoritative pitfall).
  • terraform-google-modules/network — Shared VPC with subnets, secondary ranges for GKE, Cloud NAT, and firewall rules.
  • terraform-google-modules/kubernetes-engine — GKE clusters with safe defaults (Workload Identity, Shielded Nodes, private cluster, release channel).
  • terraform-google-modules/log-export — aggregated log sink to a logging project, with BigQuery/Pub/Sub/GCS destinations.

Composition pattern

module "host_project" {
  source  = "terraform-google-modules/project-factory/google"
  version = "~> 14.0"
  name              = "vpc-host-prod"
  org_id            = var.org_id
  billing_account   = var.billing
  shared_vpc_host   = true
}

module "service_project_app1" {
  source  = "terraform-google-modules/project-factory/google"
  version = "~> 14.0"
  name              = "app1-prod"
  shared_vpc        = module.host_project.project_id
  shared_vpc_subnets = ["projects/${module.host_project.project_id}/regions/us-central1/subnetworks/app1-prod"]
}

CFT is closely aligned with — but distinct from — the Cloud Foundation Fabric (a separate Google-Cloud-led blueprint repo). Both target Landing Zones; CFT modules are more granular building blocks while Fabric ships full reference architectures.


Module Composition Patterns

Beyond grabbing CFT modules off the shelf, an architect must know how to compose modules to keep blast radius small and code DRY.

1. Root → Stack → Module

  • Root: one directory per environment (envs/prod, envs/staging). Holds the backend config and provider blocks.
  • Stack: a logical grouping like network, data-platform, gke-runtime. Each stack has its own state file (different GCS prefix).
  • Module: reusable abstraction with input variables and outputs. Lives in modules/ or a separate Git repo with semver tags.

2. Thin wrapper modules

Wrap a CFT module in a tiny internal module that injects your org's defaults (labels, log sinks, mandatory firewall rules). Consumers call the wrapper, not CFT directly — so you can upgrade CFT versions centrally.

3. Cross-stack data sharing

Avoid terraform_remote_state data sources that tightly couple stacks. Prefer:

  • google_compute_network data source by name — looser coupling.
  • Outputs published to Secret Manager or to a config object in GCS — read by the consuming stack at apply time.

4. for_each over count

count causes destructive reorders when you remove an element from the middle of a list. for_each keys resources by map key, so removing one entry only destroys that one. Always prefer for_each for GCP collections like project IAM members.

For multi-team monorepos, run Atlantis or HCP Terraform in workspace-per-stack mode and gate apply on PR approval from a CODEOWNERS-listed reviewer. This gives you terraform plan output as a PR comment without exposing the state bucket credentials to every engineer.


Importing Existing GCP Resources

ClickOps happens. When you inherit an environment built by hand, terraform import brings those resources under management without recreating them.

The basic workflow

  1. Write the resource block in HCL with the same arguments the live resource has (you can use gcloud ... describe to read them).
  2. Run terraform import <address> <gcp-id>. Example:
terraform import google_compute_instance.legacy \
  projects/my-proj/zones/us-central1-a/instances/legacy-vm
  1. Run terraform plan. If the plan shows changes, your HCL doesn't match reality — fix the HCL until the plan is empty ("no changes").
  2. Commit.

import blocks (Terraform 1.5+)

The newer declarative form makes imports reviewable in PRs:

import {
  to = google_storage_bucket.legacy_data
  id = "legacy-data-bucket"
}

Pair with terraform plan -generate-config-out=generated.tf to scaffold HCL automatically — invaluable when importing dozens of google_project_iam_member entries from a hand-managed project.

Resource ID formats to memorise

Resource Import ID
google_compute_instance projects/{project}/zones/{zone}/instances/{name}
google_storage_bucket {bucket-name}
google_project {project-id}
google_project_iam_member "{project} {role} {member}" (space-separated)
google_sql_database_instance projects/{project}/instances/{name}

Bulk import tools

  • terraformer by Google Cloud — reverse-engineers HCL from live GCP state.
  • gcloud beta resource-config bulk-export — emits Terraform-compatible HCL for a project, folder, or organisation.

Terraform Cloud / HCP Terraform with GCP

For teams beyond ~5 engineers, running terraform apply from laptops or a single CI runner stops scaling. HCP Terraform (formerly Terraform Cloud) provides hosted state, run pipelines, policy-as-code (Sentinel / OPA), and a private module registry.

Integration shape

  • One HCP Terraform workspace per stack-environment combination (e.g. network-prod, data-platform-staging).
  • VCS-driven runs: HCP Terraform watches the Git repo; opening a PR triggers a speculative plan, merging triggers apply.
  • Remote state lives in HCP — you no longer need a GCS backend bucket, but many teams keep GCS for disaster recovery exports.
  • Run tasks call out to external systems (Wiz, Snyk, Bridgecrew) for policy checks before apply.

Authentication options to GCP

Option When to use Downsides
Long-lived service account JSON key stored as a sensitive variable Quick POC Key rotation burden; key exfiltration risk
Workload Identity Federation (WIF) — exchange HCP Terraform OIDC token for short-lived GCP credentials Production Slightly more setup; strongly recommended
Local agent (tfc-agent) inside your VPC Air-gapped or VPC-SC-protected projects You manage the agent VM

Sentinel / OPA policy examples

  • Deny any google_storage_bucket with uniform_bucket_level_access = false.
  • Require all google_compute_instance resources to have a cost-center label.
  • Block google_project_iam_binding (authoritative) in favour of google_project_iam_member.

For the PCA exam: when a question mentions "SaaS-hosted Terraform runs, policy-as-code, and a private module registry," the answer is HCP Terraform (Terraform Cloud). If it adds "without storing long-lived service account keys," pair it with Workload Identity Federation.


Workload Identity Federation for Terraform Cloud → GCP

WIF lets HCP Terraform impersonate a GCP service account using its OIDC token — eliminating the need to ship a JSON key into HCP Terraform's variable store.

One-time GCP setup

resource "google_iam_workload_identity_pool" "tfc" {
  workload_identity_pool_id = "hcp-terraform-pool"
}

resource "google_iam_workload_identity_pool_provider" "tfc" {
  workload_identity_pool_id          = google_iam_workload_identity_pool.tfc.workload_identity_pool_id
  workload_identity_pool_provider_id = "hcp-terraform-provider"
  oidc {
    issuer_uri = "https://app.terraform.io"
  }
  attribute_mapping = {
    "google.subject"                        = "assertion.sub"
    "attribute.terraform_organization_name" = "assertion.terraform_organization_name"
    "attribute.terraform_workspace_name"    = "assertion.terraform_workspace_name"
    "attribute.terraform_run_phase"         = "assertion.terraform_run_phase"
  }
  attribute_condition = "assertion.terraform_organization_name == \"acme-co\""
}

resource "google_service_account_iam_member" "tfc_impersonate" {
  service_account_id = google_service_account.tf_runner.name
  role               = "roles/iam.workloadIdentityUser"
  member             = "principalSet://iam.googleapis.com/projects/${var.project_number}/locations/global/workloadIdentityPools/hcp-terraform-pool/attribute.terraform_workspace_name/network-prod"
}

HCP Terraform workspace variables

Set these as environment variables on the workspace:

TFC_GCP_PROVIDER_AUTH       = true
TFC_GCP_RUN_SERVICE_ACCOUNT_EMAIL = [email protected]
TFC_GCP_WORKLOAD_POOL_ID    = hcp-terraform-pool
TFC_GCP_WORKLOAD_PROVIDER_ID = hcp-terraform-provider
TFC_GCP_PROJECT_NUMBER      = 123456789012

HCP Terraform's runtime automatically calls sts.googleapis.com to swap its OIDC token for a short-lived (≤1 hour) access token bound to your service account.

Why this matters

  • No long-lived secrets in HCP — satisfies SOC 2 / ISO 27001 controls.
  • Per-workspace least privilege — the attribute_condition and the principalSet URI can scope which workspace can impersonate which service account.
  • Phase-aware policy — bind the apply phase to a higher-privilege SA than the plan phase using attribute.terraform_run_phase.

FAQ — Terraform and the Google Provider

Q1. How do I authenticate Terraform to GCP?

The best way is to use Application Default Credentials (ADC). Run gcloud auth application-default login on your machine, or let Terraform use the Service Account of the VM/CI-CD runner it is running on.

Q2. What is terraform import?

Use this to bring existing GCP resources (created manually in the console) under Terraform management. You provide the GCP resource ID, and Terraform creates the state entry for it.

Q3. Can I use Terraform to manage IAM?

Yes. It is highly recommended to manage IAM via Terraform to ensure that permissions are auditable and follow the "Least Privilege" principle.

Q4. Difference between google_project_iam_member and google_project_iam_binding?

  • Member: Adds a single user/service account to a role. Safe to use.
  • Binding: Manages the entire list of users for a role. Be careful—it will remove anyone not defined in your code!

Q5. What is the "Google Cloud Foundations" Fabric/Blueprint?

Google provides pre-made Terraform blueprints (like the "Cloud Foundation Fabric") that implement best practices for Landing Zones, Networking, and Security. Use these to jumpstart a new organization.


Final Architect Tip

On the PCA exam, if a question mentions "Infrastructure Consistency," "Auditable Changes," or "Managing complex environments," the answer involves Terraform. Focus on State Management (GCS backend with versioning) and Modularization. Always prefer the declarative approach over imperative scripts.

Official sources

More PCA topics