examlab .net The most efficient path to the most valuable certifications.
In this note ≈ 29 min

Deployment and Automation Scripts

5,800 words · ≈ 29 min read ·

Advanced guide on using Python, Shell, and other scripting tools to automate Google Cloud infrastructure and application deployments.

Do 20 practice questions → Free · No signup · PCA

Introduction to Deployment Automation

While Infrastructure as Code (IaC) like Terraform handles the "what" (the desired state), automation scripts often handle the "how" (the operational procedures). A Professional Cloud Architect must be proficient in writing robust scripts using the gcloud CLI, Python, or Bash to orchestrate complex deployments, perform data migrations, and automate repetitive operational tasks.

Plain-Language Explanation: Deployment Automation

Analogy 1 — The Automatic Sprinkler System

Manual deployment is like watering your garden with a hose by hand. You might miss a spot or forget a day. Automation scripts are like an automatic sprinkler system. You set the schedule and the zones, and it performs the task perfectly every time, even if you are sleeping.

Analogy 2 — The IKEA Instruction Manual vs. a Robot Assembly Arm

Terraform is like the IKEA manual—it tells you what the finished bookshelf should look like. Automation scripts are like the robotic arm that actually picks up the pieces, applies the glue, and screws the bolts in the correct order. The arm follows the manual, but it adds the "action."

Analogy 3 — The Pilot's Checklist

Scripts are like an automated cockpit checklist. Instead of the pilot manually checking every switch, the computer runs a script that verifies every system is "Go" for takeoff. If one switch is wrong, the script stops the process immediately.

The property of a script or operation where it can be applied multiple times without changing the result beyond the initial application. This is critical for reliable automation.


Scripting Tools on GCP

  1. gcloud CLI: Best for quick administrative tasks and shell scripts. Use --format and --filter to make scripts robust and machine-readable.
  2. Python Client Libraries: The gold standard for complex logic. Use the google-cloud-* libraries for rich error handling and asynchronous operations.
  3. Cloud Build: Not just for CI/CD, but a powerful engine for running arbitrary scripts in a secure, serverless environment.

Best Practices for Automation Scripts

  • Error Handling: Never assume a command succeeds. Always check exit codes in Bash (set -e) or use try-except blocks in Python.
  • Secret Management: NEVER hardcode keys or passwords. Use Secret Manager to fetch credentials at runtime.
  • Logging: Output structured logs (JSON) so they can be easily parsed by Cloud Logging.
  • Service Account Impersonation: Instead of downloading JSON keys, use --impersonate-service-account for better security.

Automating Database Migrations

Database migrations are one of the riskiest automation tasks.

  • Pre-flight Checks: Verify connection, permissions, and available disk space.
  • Backup: Trigger a snapshot via script before the migration starts.
  • Rollback Strategy: If the migration script fails, it must be able to revert the changes or notify a human immediately.
::promoted

Architect's Insight: On the PCA exam, if a question asks how to scale an automation task to thousands of resources, the answer is often to move from simple shell scripts to Cloud Build or a managed workflow engine like Google Cloud Workflows. ::


Cloud Build Triggers and Private Worker Pools

Cloud Build is the serverless workhorse for automation on GCP. A cloudbuild.yaml file describes ordered build steps where each step runs a container image (e.g., gcr.io/cloud-builders/gcloud, gcr.io/cloud-builders/docker). Steps communicate through the shared /workspace volume and can be parallelised with the waitFor directive.

Trigger Types

  • Push to branch / tag: The most common pattern — main triggers production deployments, while feature branches trigger preview environments.
  • Pull request triggers: Run linting, unit tests, and security scans before code merges. The _PR_NUMBER substitution lets you build ephemeral preview URLs.
  • Manual triggers: Useful for "break-glass" operations like database migrations gated by approval.
  • Pub/Sub triggers: Kick off builds from upstream events such as a new artifact landing in Artifact Registry.

Private Worker Pools

By default, Cloud Build runs on shared, ephemeral Google-managed VMs with public egress. For enterprise workloads this is often unacceptable. Private Pools are dedicated worker fleets that:

  • Sit inside your VPC via Private Service Connect, allowing builds to reach Cloud SQL Private IP, GKE private clusters, or on-prem systems through Cloud VPN / Interconnect.
  • Honour VPC Service Controls perimeters so build steps cannot exfiltrate data to public registries.
  • Allow custom machine types (e.g., e2-highcpu-32) for large monolith builds.
  • Are billed per-minute per pool, not per-build, so reserve capacity wisely.

A typical hardened configuration uses a private pool with no external IP, a user-managed service account scoped via --service-account, and substitutions stored in Secret Manager referenced through availableSecrets.secretManager.

For PCA scenarios requiring builds against private GKE clusters or Cloud SQL with no public IP, the correct answer is almost always Cloud Build private worker pools peered into the VPC. Default shared pools cannot reach RFC1918 addresses.


Build Provenance and SLSA Compliance

Software supply chain attacks (SolarWinds, Codecov) made provenance a board-level concern. GCP addresses this through SLSA (Supply-chain Levels for Software Artifacts) integration in Cloud Build.

SLSA Levels on GCP

  • SLSA Level 1: Build process is scripted (any cloudbuild.yaml qualifies).
  • SLSA Level 2: Cloud Build generates signed provenance metadata for every build.
  • SLSA Level 3: Builds run on hardened, isolated Cloud Build infrastructure with non-falsifiable provenance — this is the default for managed Cloud Build.

Provenance Metadata

Each build emits a JSON document describing:

  • The trigger (commit SHA, branch, repository URL).
  • The exact builder image digests used.
  • Input materials and produced artifacts with their SHA-256 digests.
  • A signature from Cloud Build's signing key, verifiable through Sigstore / Cosign.

Provenance is stored in Artifact Analysis (formerly Container Analysis) and queryable via gcloud artifacts docker images describe --show-provenance. Downstream tools — Binary Authorization, GKE admission controllers, or Kritis — can require valid SLSA-3 provenance before allowing deployment.

For PCA, remember: SLSA is about who built this and how. If a scenario mentions "verify the build came from our trusted CI pipeline before deploying," the answer is SLSA-backed provenance from Cloud Build plus Binary Authorization attestations.


Cloud Deploy: Targets, Custom Actions, and Progressive Rollouts

While Cloud Build handles building, Cloud Deploy handles delivering — it is GCP's opinionated managed continuous delivery service for GKE, Cloud Run, and Anthos.

Delivery Pipelines and Targets

A clouddeploy.yaml defines a Delivery Pipeline with ordered Targets: typically devstagingprod. Each target points at a specific GKE cluster, Cloud Run service, or Anthos config. Promotions between targets are explicit gcloud deploy releases promote calls, optionally gated by approvals (requireApproval: true).

Render and Deploy Phases

Cloud Deploy invokes Skaffold to render manifests with target-specific values (image tags, replica counts, namespaces), then applies them via the target's deploy mechanism (kubectl apply, gcloud run services replace).

Custom Target Types and Custom Actions

For non-native targets (Helm releases, Terraform stacks, Spinner pipelines), define a Custom Target Type backed by your own container that implements the render and deploy actions. This is how teams plug Cloud Deploy into bespoke environments without abandoning the audit trail.

Canary and Progressive Rollouts

For GKE and Cloud Run, Cloud Deploy supports automated canary strategies (e.g., 10% → 50% → 100%) with metric-driven verification using Cloud Monitoring SLOs. Failed verification triggers automatic rollback to the previous release.

A common PCA distractor: rolling your own promotion logic with Cloud Build + kubectl. While it works, the exam favours Cloud Deploy when answers mention "auditable promotion between environments," "approval gates," or "automated rollback on SLO breach." Cloud Build is the engine; Cloud Deploy is the conductor.


GitHub Actions to GCP via OIDC

Many organisations standardise their CI on GitHub Actions but still need to deploy to GCP. The legacy approach — storing a long-lived service account JSON key as a GitHub secret — is now considered an anti-pattern: keys leak, rotate poorly, and grant broad access.

The OIDC Flow

GitHub Actions issues a short-lived OIDC token for every workflow run, signed by GitHub's OIDC provider (token.actions.githubusercontent.com). GCP's Workload Identity Federation validates this token against a configured Workload Identity Pool and Provider, then mints a short-lived GCP access token for an impersonated service account.

Setup Sketch

permissions:
  id-token: write
  contents: read

steps:
  - uses: google-github-actions/auth@v2
    with:
      workload_identity_provider: projects/123/locations/global/workloadIdentityPools/github/providers/my-repo
      service_account: [email protected]
  - uses: google-github-actions/setup-gcloud@v2
  - run: gcloud run deploy api --image=us-docker.pkg.dev/proj/repo/api:${{ github.sha }}

Attribute Conditions

The Workload Identity Provider should pin trust to specific repositories and branches using attribute.repository == 'my-org/my-repo' && attribute.ref == 'refs/heads/main'. This prevents a forked repo or feature branch from impersonating the production deployer.

For PCA, the canonical answer pattern is: "Use Workload Identity Federation with OIDC; never store service account keys in third-party CI."


Workload Identity Federation for CI/CD Deep Dive

Workload Identity Federation (WIF) is broader than just GitHub Actions — it is GCP's keyless authentication framework for any external identity provider that speaks OIDC or SAML 2.0.

Supported Providers

  • OIDC: GitHub Actions, GitLab CI, CircleCI, Bitbucket Pipelines, AWS (via STS), Azure AD, generic Kubernetes service account tokens.
  • SAML 2.0: Okta, Ping, ADFS for human workforce identity (handled by Workforce Identity Federation, a sibling product).
  • AWS: Federated authentication so AWS Lambda functions or EC2 instances can call GCP APIs using their AWS IAM role.

Pool, Provider, Principal Hierarchy

  1. Workload Identity Pool: A namespace for external identities, scoped per project.
  2. Provider: Configuration for one external IdP (issuer URI, allowed audiences, attribute mappings).
  3. Principal: A specific subject inside the pool, addressable as principal://iam.googleapis.com/projects/.../subject/... or principalSet://... for groups.

Service Account Impersonation

The external identity does not become a GCP principal directly; it impersonates a GCP service account via the roles/iam.workloadIdentityUser binding. Permissions are then granted to that service account using normal IAM — this preserves all existing audit, allow/deny policies, and VPC-SC controls.

Audit and Rotation

Every WIF token exchange is logged in Cloud Audit Logs under sts.googleapis.com. Because tokens are short-lived (1 hour by default, configurable down to 10 minutes), there are no keys to rotate — a massive operational win over downloaded JSON keys.


Terraform Cloud / HCP Terraform with gcloud

For teams using HashiCorp's managed Terraform offering (HCP Terraform, formerly Terraform Cloud), the integration pattern with GCP combines IaC plans with imperative gcloud post-steps.

Authentication Pattern

HCP Terraform now supports dynamic credentials via Workload Identity Federation — the same OIDC flow as GitHub Actions. The workspace requests an OIDC token, exchanges it for a short-lived GCP token, and runs terraform plan / apply with no static credentials.

Hybrid IaC + Imperative Steps

Terraform is declarative and excels at provisioning, but some operations are inherently imperative:

  • Triggering a one-off Dataflow job after a BigQuery dataset is created.
  • Loading initial seed data into Cloud SQL via gcloud sql import.
  • Invalidating Cloud CDN caches after a new deployment.

The recommended pattern is terraform apply for infrastructure, followed by a local-exec provisioner or a downstream Cloud Build step that runs gcloud commands. Keep imperative steps idempotent (check-then-act) and emit structured logs for observability.

Sentinel and OPA Policies

HCP Terraform's Sentinel policy engine enforces guardrails like "no public Cloud Storage buckets" or "all VMs must have shielded VM enabled" before apply runs. This complements GCP-side Organization Policies by catching violations at plan time, before any API call is made.

For PCA scenarios mentioning "multi-cloud teams already using Terraform Cloud," the right answer is rarely "migrate to Config Connector" or "rewrite in Deployment Manager." Instead: keep Terraform for provisioning, use WIF for keyless auth, and trigger gcloud for imperative post-steps via Cloud Build.


Skaffold for GKE Inner-Loop Development

Skaffold is the open-source tool that powers Cloud Deploy's rendering but is equally valuable as a developer's inner-loop tool for GKE workloads.

The Inner Loop

skaffold dev watches local source files, rebuilds container images on save (using Buildpacks, Docker, or Jib), pushes to Artifact Registry, and applies updated manifests to the target cluster — all in seconds. File sync mode skips the rebuild entirely for interpreted languages, copying changed files directly into running pods.

Profiles for Multi-Environment Builds

A single skaffold.yaml declares profiles that override behaviour per environment:

profiles:
  - name: dev
    build:
      local: { push: false }
    deploy:
      kubectl: { manifests: [k8s/dev/*.yaml] }
  - name: prod
    build:
      googleCloudBuild:
        projectId: my-prod-project
    deploy:
      kubectl: { manifests: [k8s/prod/*.yaml] }

Render-Only Mode for GitOps

skaffold render outputs the fully hydrated manifests without applying them — perfect for GitOps workflows where Config Sync or Argo CD picks up the manifests from a Git repo. This is exactly how Cloud Deploy renders manifests internally before handing them to the target.

Cloud Code IDE Integration

Cloud Code plugins for VS Code and IntelliJ wrap Skaffold with GUI-driven debugging, log streaming, and pod shell access — useful for onboarding developers unfamiliar with Kubernetes CLI tooling.


Artifact Registry and Artifact Analysis

Artifact Registry is the unified successor to Container Registry, Maven repositories, and language-specific package stores. For PCA, the key capabilities are repository design and integrated vulnerability scanning.

Repository Modes

  • Standard: Your team's private artifacts (Docker images, Maven, npm, Python, Apt, Yum, Go modules, Helm charts, generic).
  • Remote: A pull-through cache for public upstreams like Docker Hub, Maven Central, or PyPI. Solves Docker Hub rate limits and improves build latency.
  • Virtual: A unified endpoint that fans out to multiple upstream Standard or Remote repos based on configured priority. Useful when migrating between repos without breaking client configs.

Regional vs Multi-Regional

Regional repos (us-central1-docker.pkg.dev) co-locate with workloads to minimise pull latency and egress costs. Multi-regional (us-docker.pkg.dev) is appropriate for golden images served to many regions.

Artifact Analysis (formerly Container Analysis)

Every image push triggers asynchronous vulnerability scanning against the National Vulnerability Database and Google's curated feeds. Results are stored as Occurrences linked to Notes in Artifact Analysis. Query with gcloud artifacts docker images list-vulnerabilities IMAGE_URL or stream via Pub/Sub for real-time security workflows.

The scanner also produces SBOM (Software Bill of Materials) documents in SPDX and CycloneDX formats, satisfying executive order 14028 requirements for federal contractors.

On PCA, "scan images for CVEs before deployment" almost always maps to Artifact Registry + Artifact Analysis + Binary Authorization working together. Pinning a free-tier scanner like Trivy in Cloud Build is a valid distractor but loses the audit-log integration with Cloud Audit Logs and the native attestation flow.


Binary Authorization and Signed Attestations

Binary Authorization is GCP's deploy-time gate that ensures only trusted, verified images run on GKE, Cloud Run, and Anthos clusters.

Policy Model

A Binary Authorization policy specifies, per cluster or project:

  • Default rule: What to do for images with no attestations (typically ALWAYS_DENY in production).
  • Cluster-specific rules: Allow exemptions for non-critical clusters.
  • Allowlists: Exempt images by URL pattern (e.g., gcr.io/google-containers/* for system pods).
  • Required attestors: List of attestors whose signatures must all be present.

Attestors and Attestations

An attestor is a named entity tied to a PGP or Cloud KMS-managed asymmetric key. An attestation is a signed claim ("this image SHA-256 digest passed vulnerability scanning") created by a trusted process — typically a Cloud Build step that calls gcloud beta container binauthz attestations sign-and-create.

End-to-End Flow

  1. Cloud Build builds the image, pushes to Artifact Registry.
  2. Artifact Analysis scans for CVEs; if none above threshold, a Cloud Build step signs an attestation with a KMS key reserved for the "qa-scanner" attestor.
  3. An optional manual approval step (via Cloud Deploy approval gate) creates a second attestation from the "release-manager" attestor.
  4. At deploy time, the GKE admission controller queries Binary Authorization, which checks that both attestations exist and are signed by trusted keys. Missing signatures → pod creation denied with a clear audit log entry.

Continuous Validation

Binary Authorization also runs Continuous Validation in the background, re-checking running pods every 24 hours. If a previously trusted image is later revoked (e.g., a new critical CVE is disclosed), CV emits a log-based metric — pair with a Pub/Sub alert to drain the offending workload.

Binary Authorization three-key acronym for PCA: Attestor (named identity + key), Attestation (signed claim about an image), Admission (GKE/Cloud Run check at deploy time). All three must align for a pod to start, and Cloud KMS holds the signing keys.


FAQ — Deployment Automation

Q1. Python or Bash?

For simple one-liners or sequential GCP resource creation, Bash + gcloud is fine. For anything involving loops, complex logic, or external API calls, use Python.

Q2. How do I handle authentication in scripts?

In a local environment, use gcloud auth application-default login. In a production environment (VMs, Cloud Build), the script will automatically use the Service Account attached to the resource.

Q3. Can I use Ansible with GCP?

Yes. Google provides official Ansible modules to manage GCP resources. This is common in hybrid-cloud scenarios where you are already using Ansible for on-premises config.

Q4. How do I make my scripts idempotent?

Before creating a resource (e.g., a bucket), the script should check if it already exists. In gcloud, you can use gsutil ls or gcloud ... describe and handle the error if not found.

Q5. Is it safe to store scripts in a Git repository?

Yes, and it is highly recommended. However, ensure that no secrets are committed to the code. Use .gitignore and Secret Manager for sensitive data.

Official sources

More PCA topics