Environment Management and Testing

Q: Q1. How do we ensure parity between Dev and Prod?

Use Infrastructure as Code (Terraform) . Define your infrastructure once and use variables to deploy it to multiple projects. This ensures the network, firewall, and service configurations are identical.

Q: Q3. How do we control costs in Dev/Test?

Use Cloud Scheduler to shut down non-production VMs during off-hours, use Preemptible/Spot VMs , and set aggressive Cloud Billing Budgets .

Q: Q4. What is the role of the Staging environment?

Staging is for Final Validation . It should be a 1:1 replica of Production, including data volume (masked) and traffic patterns, to catch performance issues and integration bugs.

Introduction to Environment Management

Effective environment management is a cornerstone of operational excellence for a Professional Cloud Architect. On Google Cloud, environments (Dev, Test, Staging, Prod) must be architected to ensure isolation, reproducibility, and security. The goal is to minimize "works on my machine" syndrome by maintaining high environment parity while controlling costs and securing sensitive production data.

Plain-Language Explanation: Environment Management

Analogy 1 — The Michelin Star Kitchen

Think of Development as the Chef's home kitchen, where they experiment with new flavors. Testing is the private tasting room, where they perfect the dish. Staging is the soft opening, where everything is exactly like the real restaurant but for invited guests. Production is the Main Dining Room on a Saturday night. You never try a brand-new recipe for the first time in the main dining room!

Analogy 2 — The Movie Set

Development is the rehearsal room. Testing is the CGI and sound editing studio. Staging is the private screening for the producers. Production is the worldwide theatrical release. If you find a boom mic in the shot during the rehearsal, it's fine; if it's in the theatrical release, it's a disaster.

Analogy 3 — The Pilot Plant

In chemical engineering, you don't build a massive factory immediately. You build a lab scale (Dev), then a pilot plant (Staging) to see if the process scales up. Only when the pilot plant works do you build the full-scale refinery (Production).

The practice of keeping development, staging, and production environments as similar as possible to ensure that code behaves consistently across the entire delivery pipeline.

Designing the Resource Hierarchy for Environments

The most robust way to isolate environments on GCP is at the Project level within a Folder structure.

Project Isolation: Each environment should have its own project (e.g., app-dev, app-staging, app-prod). This provides the strongest security boundary.
Shared VPC: Use a Shared VPC to manage networking centrally while allowing individual environment projects to use subnets dedicated to them.
Folder-Level Policies: Apply Organization Policies at the folder level (e.g., a "Development" folder) to enforce different rules (like allowing external IPs in Dev but not in Prod).

Environment Promotion Workflows

Moving code and infrastructure from one environment to the next should be automated via CI/CD.

Artifact Promotion: Never rebuild your container or binary for different environments. Build once in Dev/Build phase, store in Artifact Registry, and deploy the same image to Staging and then Prod.
Configuration as Code: Use environment-specific configuration files or secret managers to inject environment-specific values (like DB connection strings) into the immutable artifact.

::promoted

Architect's Insight: For the PCA exam, always advocate for Project-level isolation over simple labeling or shared projects for different environments. This ensures that a misconfiguration in Dev (like a broad firewall rule) cannot affect Prod. ::

Managing Test Data and Sandboxes

Data Masking: When using production data for testing, always use Sensitive Data Protection (DLP) to mask PII.
Synthetic Data: Prefer generating synthetic data that mimics production volume and variety without the security risk.
Sandboxes: Provide "Playground" projects for developers that have strict budget alerts and auto-cleanup scripts to encourage innovation without runaway costs.

Dev/Staging/Prod Project Separation Patterns

The PCA exam consistently rewards designs that map each environment to a dedicated GCP project under a folder structure rooted at the Organization node. The canonical hierarchy is Organization → Environments folder → {dev, nonprod, prod} sub-folders → workload projects. Each environment folder gets its own billing sub-account, its own break-glass groups in Cloud Identity, and its own set of Organization Policy bindings.

Why project-level (not label-level) separation

Quotas are per-project. A runaway Cloud Run revision in dev cannot exhaust the prod Compute Engine CPU quota when each lives in its own project.
IAM blast radius shrinks. Granting roles/owner on a dev project never leaks into prod data in BigQuery or Cloud SQL.
Logs and audit trails stay clean. Cloud Audit Logs are aggregated by project, so security review on app-prod is not polluted by developer experiments.

Folder-level guardrails to apply

Policy	dev folder	prod folder
`compute.vmExternalIpAccess`	Allow (subset)	Deny
`iam.allowedPolicyMemberDomains`	Allow contractors	Restrict to corp domain
`storage.uniformBucketLevelAccess`	Enforce	Enforce
`compute.requireOsLogin`	Enforce	Enforce
`sql.restrictPublicIp`	Allow	Enforce

Use Terraform with a google_folder resource and google_org_policy_policy bindings so the guardrails are version-controlled. Pair this with Folder-scoped sinks in Cloud Logging that route prod audit events to a separate BigQuery sink for compliance retention (often 7 years), while dev logs can expire in 30 days to control cost.

For PCA scenarios that mention "a developer accidentally deleted a database" or "non-prod IAM granted to prod", the correct answer is almost always dedicated projects under environment folders with Organization Policy guardrails — never tags, labels, or namespaces inside a shared project.

Shared VPC for Environment Network Isolation

A common architecture is one host project per environment (e.g., net-host-prod, net-host-nonprod) with workload projects attached as service projects. This centralizes network admin in the network team while letting application teams retain IAM on compute and data services.

Reference layout

net-host-prod (host project) owns the vpc-prod VPC with subnets subnet-prod-asia-east1, subnet-prod-us-central1.
app-prod-* service projects consume subnets via roles/compute.networkUser granted on specific subnets only — never on the whole VPC.
net-host-nonprod hosts the vpc-nonprod VPC; VPC Peering between prod and nonprod is deliberately not configured, so a misconfigured nonprod workload cannot reach the prod database tier.

Firewall policy hierarchy

Hierarchical firewall policies at the folder level deny all traffic from 0.0.0.0/0 except for ingress on tcp:443 behind the global Load Balancer.
Network firewall policies at the VPC level allow east-west traffic only between tagged service accounts (e.g., [email protected] to [email protected]).
Cross-environment traffic must traverse a documented egress path — typically through Cloud NAT plus Private Service Connect endpoints, never via direct VPC peering between dev and prod.

Why not a single Shared VPC for all environments

A single VPC means a single CIDR plan, single firewall surface, and a single set of routes. One misconfigured route in dev can blackhole prod traffic. Separate Shared VPCs per environment add 5 minutes of Terraform but remove an entire class of cross-env incidents.

IAM Scoping Per Environment

Identity sprawl is the most common source of environment leakage. The pattern that scores on PCA: groups bound at the lowest necessary scope, service accounts isolated per environment, no human gets standing prod access.

Group + scope matrix

gcp-app-developers@corp → bound to roles/editor on the dev folder only.
gcp-app-sre@corp → bound to roles/run.developer + roles/logging.viewer on the prod folder, with break-glass access via Privileged Access Manager (PAM) for roles/run.admin.
gcp-app-deployer@corp (CI/CD only, no humans) → bound to deploy-specific roles on each environment.

Service account hygiene

Each environment has its own service account namespace: [email protected] is distinct from [email protected].
Disable service account key creation via the iam.disableServiceAccountKeyCreation Organization Policy on the prod folder.
Use Workload Identity Federation for CI/CD instead of long-lived JSON keys — GitHub Actions or GitLab CI assume the deployer SA via OIDC.
Cross-project SA impersonation must be explicit: a dev SA cannot impersonate a prod SA because roles/iam.serviceAccountTokenCreator is never granted across the env boundary.

Audit checklist for PCA

Run gcloud asset search-all-iam-policies --scope=folders/PROD_FOLDER_ID --query="policy:user:" quarterly to find any direct user bindings that should be group bindings.
Enable Policy Analyzer in IAM Recommender to surface unused permissions, and accept the recommendations on dev first to validate the pattern before applying to prod.

Granting roles/iam.serviceAccountUser on a prod service account to the dev-developers group is a silent prod breach — the developer can launch a Compute Engine VM as the prod SA and read prod data. Always scope roles/iam.serviceAccountUser to the specific project where the SA is allowed to be attached.

Sandbox Folders for Experimentation

Sandboxes are the safety valve that prevents shadow IT. The PCA-recommended pattern is an Organization → Sandbox folder, isolated from the main Environments folder, with aggressive guardrails.

Sandbox folder configuration

Auto-cleanup: A scheduled Cloud Function triggered by Cloud Scheduler runs gcloud projects list --filter="parent.id=SANDBOX_FOLDER AND createTime<-P30D" and deletes any project older than 30 days unless tagged keep=true.
Budget cap: Each sandbox project is created from a Project Factory Terraform module that auto-attaches a Cloud Billing budget of $50/month with a Pub/Sub trigger that disables billing at 100%.
No prod data: An Organization Policy gcp.restrictServiceUsage denies bigquerydatatransfer.googleapis.com and datastream.googleapis.com in the Sandbox folder, preventing accidental pipelines from prod.
Public IP allowed, but logged: Unlike prod, sandboxes can have external IPs for quick demos, but VPC Flow Logs stream to a security SIEM so security can spot exfil patterns.

Self-service workflow

Developers request a sandbox via a Backstage or internal portal that calls a Cloud Build trigger. The build runs the project-factory Terraform, assigns the requester as roles/owner on that one project (never on the folder), and posts the project ID to Slack. This shifts experimentation out of the dev environment (where it pollutes shared state) into an ephemeral, capped space.

Terraform Workspaces vs Directory-per-Environment

Terraform offers two patterns for multi-environment IaC and the PCA exam tests when each is appropriate.

Workspaces (CLI workspaces or Terraform Cloud workspaces)

# main.tf - shared
resource "google_compute_network" "vpc" {
  name    = "vpc-${terraform.workspace}"
  project = var.project_ids[terraform.workspace]
}

Single codebase, state file per workspace.
Pro: Minimal duplication when environments are truly identical in shape.
Con: Easy to apply to the wrong workspace — terraform workspace select prod is one typo away from disaster.

Directory-per-environment (recommended for PCA)

infra/
  modules/
    network/
    gke-cluster/
  envs/
    dev/main.tf       (calls modules with dev vars)
    staging/main.tf   (calls modules with staging vars)
    prod/main.tf      (calls modules with prod vars, separate backend)

Each env has its own backend bucket: gs://tf-state-prod, gs://tf-state-dev, with bucket-level IAM that prevents the dev pipeline SA from touching prod state.
Promotion = PR that bumps a module version pin (source = "git::...//modules/gke-cluster?ref=v1.4.2") in envs/staging/main.tf, then a follow-up PR in envs/prod/main.tf.
This pattern aligns with Cloud Build triggers filtered by changed paths: only changes under envs/prod/ trigger the prod plan/apply pipeline, gated on manual approval.

For mixed teams, Terraform Cloud workspaces backed by VCS combine the best of both: directory-per-env in Git, with workspace-level run gates and audit logs.

Environment-Specific Secret Manager Layout

Secrets must never be shared across environments — a leaked dev secret should not unlock prod. Secret Manager in GCP supports this through per-project secrets plus IAM at the secret level.

Naming and project layout

Secret name stays stable across environments: db-password, stripe-api-key, oauth-client-secret.
Each environment has its own project hosting those secrets: secrets-dev, secrets-staging, secrets-prod.
The application reads projects/secrets-${ENV}/secrets/db-password/versions/latest — the only thing that changes is the project prefix injected at deploy time.

Access controls per environment

[email protected] gets roles/secretmanager.secretAccessor on projects/secrets-prod only.
roles/secretmanager.admin is never granted to humans on the prod secrets project — rotation happens via a Cloud Scheduler + Cloud Function that calls Secret Manager and the upstream provider API.
CMEK encryption is enabled on prod secrets using a Cloud KMS key in a separate kms-prod project, so even a Secret Manager admin without roles/cloudkms.cryptoKeyDecrypter cannot exfiltrate plaintext.

Rotation and versioning

Automatic rotation is configured via the rotation field on the secret resource, triggering a Pub/Sub topic that a Cloud Function listens to.
Old versions are disabled (not destroyed) for 30 days to support rollback.
For PCA: when a question describes "a secret was leaked in dev logs and we need to ensure prod is unaffected", the answer leverages the per-environment project separation — no rotation needed in prod because the dev secret never had prod access.

Inject the env-specific secret project ID at deploy time via Cloud Build substitution variables (_SECRET_PROJECT=$_ENV-secrets). Your application code stays environment-agnostic and reads from projects/${SECRET_PROJECT}/secrets/... — one less branch to maintain.

Cloud Build Substitutions for Per-Environment Pipelines

A single cloudbuild.yaml parameterized with substitutions lets you maintain one pipeline definition across dev/staging/prod, with the per-env values supplied by the trigger.

Substitution-driven pipeline

# cloudbuild.yaml
substitutions:
  _ENV: dev
  _PROJECT_ID: app-dev
  _REGION: us-central1
  _MIN_INSTANCES: '0'
  _CMEK_KEY: ''

steps:
  - id: build
    name: gcr.io/cloud-builders/docker
    args: ['build', '-t', 'us-central1-docker.pkg.dev/$_PROJECT_ID/app/web:$SHORT_SHA', '.']

  - id: deploy
    name: gcr.io/google.com/cloudsdktool/cloud-sdk
    entrypoint: gcloud
    args:
      - run
      - deploy
      - web-$_ENV
      - --image=us-central1-docker.pkg.dev/$_PROJECT_ID/app/web:$SHORT_SHA
      - --region=$_REGION
      - --min-instances=$_MIN_INSTANCES
      - --project=$_PROJECT_ID

Trigger configuration per environment

dev trigger: Fires on push to main. Substitutions: _ENV=dev, _PROJECT_ID=app-dev, _MIN_INSTANCES=0.
staging trigger: Fires on tag matching staging-*. Substitutions: _ENV=staging, _PROJECT_ID=app-staging, _MIN_INSTANCES=1.
prod trigger: Fires on tag matching v[0-9]+.[0-9]+.[0-9]+. Substitutions: _ENV=prod, _PROJECT_ID=app-prod, _MIN_INSTANCES=2, _CMEK_KEY=projects/kms-prod/.... Requires manual approval (approval_config.approval_required: true).

Each Cloud Build trigger runs as its own service account. The prod trigger SA gets roles/run.admin only on app-prod; the dev trigger SA gets the same role only on app-dev. Never reuse one builder SA across environments — that single account becomes the highest-value target in the entire org.

Build-time vs runtime substitution

Only image tags, region, and resource names should differ at build time. Application config (feature flags, log levels) should come from runtime sources (Secret Manager, Firestore, App Config equivalent via Cloud Storage) so a config change does not require a redeploy. The image SHA promoted from dev to prod must be byte-identical.

GKE Config Sync for Per-Environment Policies

For organizations running GKE across environments, Config Sync (part of Anthos Config Management) provides GitOps-driven policy and config delivery per cluster.

Repository structure

config-sync/
  base/                    (common Kustomize bases — namespaces, network policies)
  overlays/
    dev/
      patches/
        deny-internet.yaml   (less restrictive)
        resource-quotas.yaml (smaller quotas)
    staging/
    prod/
      patches/
        deny-internet.yaml   (deny all egress except allowlist)
        resource-quotas.yaml (production-sized)
        pod-security.yaml    (restricted PSS)

Each GKE cluster (gke-dev, gke-staging, gke-prod) has its RootSync resource pointing to a different overlay path in the same Git repo, ensuring all clusters track a single source of truth while applying environment-appropriate constraints.

Policy Controller (Gatekeeper) per env

Policy Controller enforces constraints via OPA Gatekeeper templates:

dev: warn mode on most constraints, so developers see violations but deploys aren't blocked.
staging: deny mode on critical constraints (no privileged containers, no hostNetwork) but warn-only on image registry restrictions.
prod: deny mode on all constraints, including K8sAllowedRepos that restricts images to *-docker.pkg.dev/PROD_PROJECT/*.

Drift detection and remediation

Config Sync continuously reconciles cluster state against Git. If a developer manually edits a prod NetworkPolicy via kubectl edit, Config Sync reverts it within seconds and emits a metric to Cloud Monitoring that triggers a PagerDuty alert. This makes prod effectively immutable outside the Git workflow — the same property you want from your IaC layer, applied at the Kubernetes API layer.

Ephemeral Preview Environments via Cloud Run

Per-PR preview environments dramatically improve developer feedback loops. Cloud Run is the GCP service of choice because of its sub-second cold starts, scale-to-zero billing, and per-revision URLs.

Implementation pattern

A GitHub Actions workflow triggered on pull_request builds the container, tags it pr-${PR_NUMBER}-${SHA}, and pushes to Artifact Registry.
The workflow runs gcloud run deploy preview-pr-${PR_NUMBER} --image=... --region=us-central1 --no-traffic, creating a new Cloud Run service per PR.
Cloud Run returns a deterministic URL https://preview-pr-123-xyz-uc.a.run.app, which the workflow posts as a PR comment.
When the PR closes (merged or abandoned), a follow-up workflow runs gcloud run services delete preview-pr-${PR_NUMBER}.

Cost and security guardrails

All preview services live in a dedicated app-preview project (not dev, not staging).
--min-instances=0 and --max-instances=3 keep idle cost at zero and active cost capped.
The preview SA has read-only access to a sanitized copy of staging data in a separate BigQuery dataset — never to production.
Cloud Run authentication is enabled (--no-allow-unauthenticated) and developers authenticate via gcloud auth print-identity-token or via Identity-Aware Proxy bound to the corp domain.

Why not GKE for previews

A GKE namespace per PR works but adds scheduling latency (pulling images, scheduling pods) and namespace cleanup overhead. Cloud Run gives you a fresh URL in under 30 seconds, scales to zero between developer interactions, and removes the need to manage cluster resources. Reserve GKE-based previews for workloads with sticky state (databases, stateful sets) that Cloud Run cannot host.

Data Anonymization for Staging Environments

Staging must mirror production realistically without becoming a compliance liability. The combination of Sensitive Data Protection (DLP), Dataflow, and Cloud KMS provides a repeatable anonymization pipeline.

Pipeline architecture

Scheduled BigQuery export snapshots prod tables nightly to a staging-bound Cloud Storage bucket using BigQuery Data Transfer Service with a service account that has read-only on prod.
A Dataflow template (the Cloud_DLP_GCS_Text_to_BigQuery Google-provided template) reads the snapshot, calls the DLP API with a DeidentifyConfig, and writes to the staging BigQuery dataset.
The DLP config uses:
- Format-preserving encryption (FPE) with FFX on email and phone fields so referential integrity is preserved (the same prod email always maps to the same staging email).
- Cryptographic hashing on SSN/national-ID fields, with the key in Cloud KMS in a separate kms-deid project.
- Bucketing on numeric fields like salary (< 50000, 50000-100000, > 100000) to retain statistical properties without exact values.
- Date shift on timestamps with a per-record consistent offset so time-series analysis still works.

Re-identification controls

The KMS key used for FPE is wrapped under a KMS key ring that only the deidentification SA can use. Re-identification requires explicit IAM grants that trigger a Cloud Audit Log alert.
Staging access is gated by a separate group (gcp-app-staging-readers@corp) that is not granted to product engineers by default — even though the data is anonymized, the principle of least privilege still applies.

Validation before promotion

Before staging data goes live, a Cloud Run job runs k-anonymity and l-diversity checks (using DLP's RiskAnalysisJob) to confirm the anonymized dataset meets the org's privacy thresholds (e.g., k≥5). If the job fails, the staging refresh aborts and an alert fires — preventing accidentally weak anonymization from reaching engineering eyes.

For PCA scenarios on test data: DLP + Dataflow is the GCP-native answer for anonymization at scale, with format-preserving encryption for fields that need referential integrity, cryptographic hashing for one-way pseudonymization, and bucketing/date shift for statistical fields. Keep the deidentification KMS key in a separate project to enforce separation of duties.

FAQ — Environment Management

Q1. How do we ensure parity between Dev and Prod?

Use Infrastructure as Code (Terraform). Define your infrastructure once and use variables to deploy it to multiple projects. This ensures the network, firewall, and service configurations are identical.

Q2. Should Dev and Prod be in the same VPC?

No. For high security, keep them in separate VPCs (possibly in a Shared VPC architecture where they occupy different subnets and are isolated by firewall rules).

Q3. How do we control costs in Dev/Test?

Use Cloud Scheduler to shut down non-production VMs during off-hours, use Preemptible/Spot VMs, and set aggressive Cloud Billing Budgets.

Q4. What is the role of the Staging environment?

Staging is for Final Validation. It should be a 1:1 replica of Production, including data volume (masked) and traffic patterns, to catch performance issues and integration bugs.

Q5. Can we use the same Service Accounts across environments?

Absolutely not. Each environment must have its own unique Service Accounts with permissions scoped only to that environment's resources.

Introduction to Environment Management

Plain-Language Explanation: Environment Management

Analogy 1 — The Michelin Star Kitchen

Analogy 2 — The Movie Set

Analogy 3 — The Pilot Plant

Designing the Resource Hierarchy for Environments

Environment Promotion Workflows

Managing Test Data and Sandboxes

Dev/Staging/Prod Project Separation Patterns

Why project-level (not label-level) separation

Folder-level guardrails to apply

Shared VPC for Environment Network Isolation

Reference layout

Firewall policy hierarchy

Why not a single Shared VPC for all environments

IAM Scoping Per Environment

Group + scope matrix

Service account hygiene

Audit checklist for PCA

Sandbox Folders for Experimentation

Sandbox folder configuration

Self-service workflow

Terraform Workspaces vs Directory-per-Environment

Workspaces (CLI workspaces or Terraform Cloud workspaces)

Directory-per-environment (recommended for PCA)

Environment-Specific Secret Manager Layout

Naming and project layout

Access controls per environment

Rotation and versioning

Cloud Build Substitutions for Per-Environment Pipelines

Substitution-driven pipeline

Trigger configuration per environment

Build-time vs runtime substitution

GKE Config Sync for Per-Environment Policies

Repository structure

Policy Controller (Gatekeeper) per env

Drift detection and remediation

Ephemeral Preview Environments via Cloud Run

Implementation pattern

Cost and security guardrails

Why not GKE for previews

Data Anonymization for Staging Environments

Pipeline architecture

Re-identification controls

Validation before promotion

FAQ — Environment Management

Q1. How do we ensure parity between Dev and Prod?

Q2. Should Dev and Prod be in the same VPC?

Q3. How do we control costs in Dev/Test?

Q4. What is the role of the Staging environment?

Q5. Can we use the same Service Accounts across environments?

Official sources

More PCA topics