Introduction to Terraform on GCP
For a Professional Cloud Architect, Terraform is the industry-standard tool for Infrastructure as Code (IaC). It allows you to define your Google Cloud resources in a declarative language (HCL) and manage them through a version-controlled workflow.
The Google Provider is the bridge between Terraform and the Google Cloud APIs.
For PCA scenarios that require repeatable Landing Zone provisioning across hundreds of GCP projects, the canonical answer is Terraform with the hashicorp/google provider, a remote GCS backend for state, and Cloud Foundation Toolkit (CFT) modules. Console-based or gcloud-script answers are wrong for enterprise-scale, auditable provisioning.
白話文解釋(Plain English Explanation)
Analogy 1 — IKEA Instructions vs Hand-Built Furniture
The google provider is like an IKEA assembly manual for GCP. The manual (HCL) declares the final shape of the bookshelf (google_compute_network, google_storage_bucket); the Allen wrench (provider plugin) translates each step into the right action against GCP APIs. If you hand-build furniture with gcloud scripts, every chair comes out slightly different — IaC guarantees that chair #500 looks identical to chair #1.
Analogy 2 — Two-Key Safe Deposit Box (State + Locking)
The Terraform state file on a GCS backend plus state locking is like a safe deposit box at a bank that needs two keys: the bank's key (the GCS object lock) and your key (your credentials). Two clerks (two terraform apply runs) cannot open the box at the same time — one must wait. Without locking, both clerks try to update the ledger simultaneously and you end up with two contradictory records of who owns which google_compute_instance.
Analogy 3 — Lego Master Sets (CFT Modules)
The Cloud Foundation Toolkit (CFT) Project Factory and IAM modules are like certified Lego master sets published by Google. Instead of you designing the spaceship from scratch (writing google_project + google_billing_account_iam_member + API enablement by hand), you snap together a Project Factory module that already includes the wings, cockpit, and safety stickers — the Shared VPC attachment, budget alerts, and default IAM bindings are built in.
Core Terraform Concepts
- HCL (HashiCorp Configuration Language): The human-readable language used to write Terraform files (
.tf). - State: A file (
terraform.tfstate) that maps your code to the real-world resources in GCP. - Plan: A preview of the changes Terraform will make to your infrastructure.
- Apply: The process of executing the plan and making the changes in GCP.
Plain-Language Analogies for Terraform
Analogy 1 — The Architectural Blueprint (IaC)
Imagine building a skyscraper. Terraform is the Blueprint. Instead of telling workers, "Put a brick here, then put a window there" (which is what a script/CLI does), you hand them a drawing that says "This building should have 50 floors and 200 windows." The workers (Terraform) look at the drawing, see what’s already there, and add exactly what is missing to match the drawing.
Analogy 2 — The Receipt and the Inventory (State Management)
The Terraform State file is like a Detailed Receipt from a grocery store combined with an Inventory List. If you lose the receipt, the store (GCP) still has the items, but you don't know exactly what you bought or where you put it. The state file tells Terraform: "Last time we checked, this specific ID in GCP belonged to this specific resource in your code."
Analogy 3 — Lego Blocks (Modules)
Terraform Modules are like Pre-built Lego Sets. Instead of building a "Car" from 500 tiny individual pieces every time, you create a "Car Module." Now, whenever you need a car, you just say module "my_car" { color = "red" }. This makes your infrastructure consistent and easy to scale.
The Google Cloud Provider
The provider handles the authentication and API calls to GCP.
provider "google" {
project = "my-project-id"
region = "us-central1"
}
googleprovider: For general availability (GA) resources.google-betaprovider: For resources or features that are still in Beta. You can use both in the same project.
State Management Best Practices
This is a critical topic for the PCA exam.
- Remote State: Never keep your state file on your local laptop. Store it in a Cloud Storage Bucket.
- Enable Object Versioning on the bucket to recover from accidental state corruption.
- State Locking: Use a backend that supports locking (like GCS) to prevent two people from running
terraform applyat the same time and corrupting the state. - Sensitive Data: Remember that the state file contains sensitive data (like database passwords) in plain text. Secure the GCS bucket with IAM and encryption.
Advanced Terraform Patterns
1. Workspaces
Use workspaces to manage different environments (Dev, Prod) using the same code. However, for large enterprise GCP environments, many architects prefer separate directories per environment for better isolation.
2. Output Variables
Use output to export information (like an IP address or a Service Account email) that can be used by other parts of your infrastructure or shared with other teams.
3. Resource Targeting
If you need to fix a specific resource without touching the rest of the stack: terraform apply -target=google_compute_instance.my_vm. Use this sparingly as it can lead to inconsistent state.
Architect's Insight: Always run terraform plan and review the output before running apply. This is your "Seatbelt" that prevents accidental deletions of production databases.
::
Provider — the Terraform plugin that translates HCL resource declarations into authenticated API calls against a target platform. For GCP, the two relevant providers are hashicorp/google (GA) and hashicorp/google-beta (preview surfaces). A single root module can declare both and route specific resources via the provider = google-beta meta-argument.
google vs google-beta Provider
The HashiCorp registry publishes two distinct providers for GCP, and on the PCA exam you must know when to reach for each:
hashicorp/google(GA provider): Maps to GA REST endpoints (e.g.compute.googleapis.com/v1,container.googleapis.com/v1). Backed by HashiCorp's standard SLA and shipped on a roughly weekly release cadence.hashicorp/google-beta: Exposes resources and arguments that live on Beta GCP APIs (e.g.container.googleapis.com/v1beta1). New features like GKE Autopilot DNS-based endpoints, certain Cloud Run preview flags, and AlloyDB preview features land here first.
You declare both in the same root module and pass them per-resource:
terraform {
required_providers {
google = { source = "hashicorp/google", version = "~> 5.0" }
google-beta = { source = "hashicorp/google-beta", version = "~> 5.0" }
}
}
resource "google_container_cluster" "preview" {
provider = google-beta # opt this cluster into Beta features
name = "edge-cluster"
# ...
}
Why this matters for architects
- Resource drift on upgrade. A Beta argument that graduates to GA may be renamed; flipping
provider = google-beta→provider = googlecan force replacement. Pin versions explicitly. - Mixing in one project. It is supported and common to manage 95% of resources with
googleand a handful of cutting-edge resources withgoogle-beta. - Authentication is shared. Both providers consume the same Application Default Credentials and the same
project/regionarguments.
Common trap: copying a Beta-only argument (e.g. enable_l4_ilb_subsetting when it was still Beta) into a provider = google resource block. Terraform will accept the syntax but the GA API rejects the field at apply time. Always check the registry doc page header — it tells you "Beta" vs GA.
Resource Lifecycle Meta-Arguments
Every Terraform resource accepts a lifecycle block that overrides default plan/apply behaviour. PCA scenarios about zero-downtime cutovers and protected production resources hinge on these.
create_before_destroy
Default order is destroy-then-create. For resources like google_compute_instance_template attached to a google_compute_instance_group_manager, this causes an outage. Setting:
lifecycle {
create_before_destroy = true
}
instructs Terraform to provision the new template first, swap the MIG to point at it, then destroy the old template — preserving the rolling update.
prevent_destroy
A guardrail for stateful resources:
resource "google_sql_database_instance" "prod" {
# ...
lifecycle { prevent_destroy = true }
}
Any plan that would destroy this google_sql_database_instance (including a terraform destroy or an upstream change forcing replacement) fails at plan time. You must remove the flag in a separate commit before the resource can be deleted — perfect for production Cloud SQL and GCS buckets holding compliance data.
ignore_changes
Use when an external system mutates a resource attribute that Terraform should not "correct":
resource "google_compute_instance" "vm" {
lifecycle {
ignore_changes = [metadata["ssh-keys"], labels["last-deployed-at"]]
}
}
Common GCP examples: OS Login key rotation, autoscaler-managed target_size on a MIG, or labels patched by Cloud Asset Inventory automation.
replace_triggered_by (Terraform 1.2+)
Forces replacement of resource A when resource B changes — useful for tying a google_compute_instance to the hash of its startup-script google_storage_bucket_object.
Remote State on GCS with Locking
The recommended backend for GCP Terraform is gcs:
terraform {
backend "gcs" {
bucket = "tf-state-prod-acme-co"
prefix = "platform/network"
}
}
What you get out of the box
- Strong consistency — Cloud Storage provides read-after-write consistency, so plan/apply always reads the latest state.
- Native state locking — as of provider v4+, the
gcsbackend uses Cloud Storage object generation conditions to implement locking; no separate Firestore/DynamoDB equivalent is needed (unlike S3, which requires DynamoDB). Some legacy designs paired GCS with a Firestore document for lock metadata, but modern setups rely on the built-in mechanism. - Object Versioning — enable on the state bucket so a corrupted state can be rolled back to a previous generation via
gsutil cp gs://bucket/path#<generation> .. - CMEK encryption — set
kms_key_nameon the bucket to encrypt state with Cloud KMS, satisfying compliance regimes that disallow Google-managed keys.
Hardening checklist
- Dedicated state project separate from workload projects.
- Uniform bucket-level access + IAM granting
roles/storage.objectAdminonly to the CI/CD service account. - VPC Service Controls perimeter around the state bucket to prevent exfiltration of state (which contains secrets).
- Audit logs (Data Access) enabled on the bucket — every
terraform planshows up as anobjects.get.
The Terraform state file contains plaintext secrets — database passwords from google_sql_user, service account keys, Secret Manager versions. Treating the state bucket as a Tier-0 asset (CMEK + VPC SC + restrictive IAM) is non-negotiable for regulated workloads.
Cloud Foundation Toolkit (CFT) Modules
The Cloud Foundation Toolkit is Google's open-source collection of Terraform modules published under github.com/terraform-google-modules. Using CFT is the difference between a hand-rolled Landing Zone that drifts in 6 months and a opinionated, supported baseline.
Modules you must know for PCA
terraform-google-modules/project-factory— creates agoogle_project, links billing, enables APIs, attaches the project to Shared VPC, creates a default service account, and applies budget alerts in a single module call.terraform-google-modules/iam— manages IAM bindings at org / folder / project / bucket / service-account scope with safeadditivesemantics (avoiding theiam_bindingauthoritative pitfall).terraform-google-modules/network— Shared VPC with subnets, secondary ranges for GKE, Cloud NAT, and firewall rules.terraform-google-modules/kubernetes-engine— GKE clusters with safe defaults (Workload Identity, Shielded Nodes, private cluster, release channel).terraform-google-modules/log-export— aggregated log sink to a logging project, with BigQuery/Pub/Sub/GCS destinations.
Composition pattern
module "host_project" {
source = "terraform-google-modules/project-factory/google"
version = "~> 14.0"
name = "vpc-host-prod"
org_id = var.org_id
billing_account = var.billing
shared_vpc_host = true
}
module "service_project_app1" {
source = "terraform-google-modules/project-factory/google"
version = "~> 14.0"
name = "app1-prod"
shared_vpc = module.host_project.project_id
shared_vpc_subnets = ["projects/${module.host_project.project_id}/regions/us-central1/subnetworks/app1-prod"]
}
CFT is closely aligned with — but distinct from — the Cloud Foundation Fabric (a separate Google-Cloud-led blueprint repo). Both target Landing Zones; CFT modules are more granular building blocks while Fabric ships full reference architectures.
Module Composition Patterns
Beyond grabbing CFT modules off the shelf, an architect must know how to compose modules to keep blast radius small and code DRY.
1. Root → Stack → Module
- Root: one directory per environment (
envs/prod,envs/staging). Holds the backend config and provider blocks. - Stack: a logical grouping like
network,data-platform,gke-runtime. Each stack has its own state file (different GCS prefix). - Module: reusable abstraction with input variables and outputs. Lives in
modules/or a separate Git repo with semver tags.
2. Thin wrapper modules
Wrap a CFT module in a tiny internal module that injects your org's defaults (labels, log sinks, mandatory firewall rules). Consumers call the wrapper, not CFT directly — so you can upgrade CFT versions centrally.
3. Cross-stack data sharing
Avoid terraform_remote_state data sources that tightly couple stacks. Prefer:
google_compute_networkdata source by name — looser coupling.- Outputs published to Secret Manager or to a config object in GCS — read by the consuming stack at apply time.
4. for_each over count
count causes destructive reorders when you remove an element from the middle of a list. for_each keys resources by map key, so removing one entry only destroys that one. Always prefer for_each for GCP collections like project IAM members.
For multi-team monorepos, run Atlantis or HCP Terraform in workspace-per-stack mode and gate apply on PR approval from a CODEOWNERS-listed reviewer. This gives you terraform plan output as a PR comment without exposing the state bucket credentials to every engineer.
Importing Existing GCP Resources
ClickOps happens. When you inherit an environment built by hand, terraform import brings those resources under management without recreating them.
The basic workflow
- Write the
resourceblock in HCL with the same arguments the live resource has (you can usegcloud ... describeto read them). - Run
terraform import <address> <gcp-id>. Example:
terraform import google_compute_instance.legacy \
projects/my-proj/zones/us-central1-a/instances/legacy-vm
- Run
terraform plan. If the plan shows changes, your HCL doesn't match reality — fix the HCL until the plan is empty ("no changes"). - Commit.
import blocks (Terraform 1.5+)
The newer declarative form makes imports reviewable in PRs:
import {
to = google_storage_bucket.legacy_data
id = "legacy-data-bucket"
}
Pair with terraform plan -generate-config-out=generated.tf to scaffold HCL automatically — invaluable when importing dozens of google_project_iam_member entries from a hand-managed project.
Resource ID formats to memorise
| Resource | Import ID |
|---|---|
google_compute_instance |
projects/{project}/zones/{zone}/instances/{name} |
google_storage_bucket |
{bucket-name} |
google_project |
{project-id} |
google_project_iam_member |
"{project} {role} {member}" (space-separated) |
google_sql_database_instance |
projects/{project}/instances/{name} |
Bulk import tools
terraformerby Google Cloud — reverse-engineers HCL from live GCP state.gcloud beta resource-config bulk-export— emits Terraform-compatible HCL for a project, folder, or organisation.
Terraform Cloud / HCP Terraform with GCP
For teams beyond ~5 engineers, running terraform apply from laptops or a single CI runner stops scaling. HCP Terraform (formerly Terraform Cloud) provides hosted state, run pipelines, policy-as-code (Sentinel / OPA), and a private module registry.
Integration shape
- One HCP Terraform workspace per stack-environment combination (e.g.
network-prod,data-platform-staging). - VCS-driven runs: HCP Terraform watches the Git repo; opening a PR triggers a speculative plan, merging triggers
apply. - Remote state lives in HCP — you no longer need a GCS backend bucket, but many teams keep GCS for disaster recovery exports.
- Run tasks call out to external systems (Wiz, Snyk, Bridgecrew) for policy checks before apply.
Authentication options to GCP
| Option | When to use | Downsides |
|---|---|---|
| Long-lived service account JSON key stored as a sensitive variable | Quick POC | Key rotation burden; key exfiltration risk |
| Workload Identity Federation (WIF) — exchange HCP Terraform OIDC token for short-lived GCP credentials | Production | Slightly more setup; strongly recommended |
Local agent (tfc-agent) inside your VPC |
Air-gapped or VPC-SC-protected projects | You manage the agent VM |
Sentinel / OPA policy examples
- Deny any
google_storage_bucketwithuniform_bucket_level_access = false. - Require all
google_compute_instanceresources to have acost-centerlabel. - Block
google_project_iam_binding(authoritative) in favour ofgoogle_project_iam_member.
For the PCA exam: when a question mentions "SaaS-hosted Terraform runs, policy-as-code, and a private module registry," the answer is HCP Terraform (Terraform Cloud). If it adds "without storing long-lived service account keys," pair it with Workload Identity Federation.
Workload Identity Federation for Terraform Cloud → GCP
WIF lets HCP Terraform impersonate a GCP service account using its OIDC token — eliminating the need to ship a JSON key into HCP Terraform's variable store.
One-time GCP setup
resource "google_iam_workload_identity_pool" "tfc" {
workload_identity_pool_id = "hcp-terraform-pool"
}
resource "google_iam_workload_identity_pool_provider" "tfc" {
workload_identity_pool_id = google_iam_workload_identity_pool.tfc.workload_identity_pool_id
workload_identity_pool_provider_id = "hcp-terraform-provider"
oidc {
issuer_uri = "https://app.terraform.io"
}
attribute_mapping = {
"google.subject" = "assertion.sub"
"attribute.terraform_organization_name" = "assertion.terraform_organization_name"
"attribute.terraform_workspace_name" = "assertion.terraform_workspace_name"
"attribute.terraform_run_phase" = "assertion.terraform_run_phase"
}
attribute_condition = "assertion.terraform_organization_name == \"acme-co\""
}
resource "google_service_account_iam_member" "tfc_impersonate" {
service_account_id = google_service_account.tf_runner.name
role = "roles/iam.workloadIdentityUser"
member = "principalSet://iam.googleapis.com/projects/${var.project_number}/locations/global/workloadIdentityPools/hcp-terraform-pool/attribute.terraform_workspace_name/network-prod"
}
HCP Terraform workspace variables
Set these as environment variables on the workspace:
TFC_GCP_PROVIDER_AUTH = true
TFC_GCP_RUN_SERVICE_ACCOUNT_EMAIL = [email protected]
TFC_GCP_WORKLOAD_POOL_ID = hcp-terraform-pool
TFC_GCP_WORKLOAD_PROVIDER_ID = hcp-terraform-provider
TFC_GCP_PROJECT_NUMBER = 123456789012
HCP Terraform's runtime automatically calls sts.googleapis.com to swap its OIDC token for a short-lived (≤1 hour) access token bound to your service account.
Why this matters
- No long-lived secrets in HCP — satisfies SOC 2 / ISO 27001 controls.
- Per-workspace least privilege — the
attribute_conditionand theprincipalSetURI can scope which workspace can impersonate which service account. - Phase-aware policy — bind the
applyphase to a higher-privilege SA than theplanphase usingattribute.terraform_run_phase.
FAQ — Terraform and the Google Provider
Q1. How do I authenticate Terraform to GCP?
The best way is to use Application Default Credentials (ADC). Run gcloud auth application-default login on your machine, or let Terraform use the Service Account of the VM/CI-CD runner it is running on.
Q2. What is terraform import?
Use this to bring existing GCP resources (created manually in the console) under Terraform management. You provide the GCP resource ID, and Terraform creates the state entry for it.
Q3. Can I use Terraform to manage IAM?
Yes. It is highly recommended to manage IAM via Terraform to ensure that permissions are auditable and follow the "Least Privilege" principle.
Q4. Difference between google_project_iam_member and google_project_iam_binding?
- Member: Adds a single user/service account to a role. Safe to use.
- Binding: Manages the entire list of users for a role. Be careful—it will remove anyone not defined in your code!
Q5. What is the "Google Cloud Foundations" Fabric/Blueprint?
Google provides pre-made Terraform blueprints (like the "Cloud Foundation Fabric") that implement best practices for Landing Zones, Networking, and Security. Use these to jumpstart a new organization.
Final Architect Tip
On the PCA exam, if a question mentions "Infrastructure Consistency," "Auditable Changes," or "Managing complex environments," the answer involves Terraform. Focus on State Management (GCS backend with versioning) and Modularization. Always prefer the declarative approach over imperative scripts.