Introduction to CI/CD in GCP
For a Professional Cloud Architect, a CI/CD pipeline is the "nervous system" of the cloud infrastructure. It automates the transition from code to customer, reducing human error and increasing the velocity of innovation.
Google Cloud provides a fully managed, serverless CI/CD stack that integrates natively with GKE, Cloud Run, and Compute Engine. The core triad is Cloud Build (build/test runner), Artifact Registry (binary/container store with Artifact Analysis), and Cloud Deploy (multi-target progressive delivery). Around them sit Skaffold, Cloud Source Repositories, Binary Authorization, and Workflows.
A repeatable automated path from source commit to a running production workload. On GCP this typically means Cloud Build executing a cloudbuild.yaml, pushing immutable images to Artifact Registry, then Cloud Deploy promoting a Release through ordered Target environments with optional canary phases. Reference: https://cloud.google.com/deploy/docs/overview
Plain-Language Explanation:
Analogy 1 — The Automatic Spelling Checker (CI)
Imagine writing a book. Continuous Integration is like having a magical spelling and grammar checker that scans every single sentence the moment you finish typing it. Cloud Build trigger watches the repo; the moment you git push, the build steps lint, compile, and unit-test the code, so you never end up with a "book" full of merge errors at release time.
Analogy 2 — The Taste-Tester and the Critic (Staging vs Prod)
Continuous Delivery is like a chef who plates a dish and hands it to a Taste-Tester (Cloud Deploy staging target). If the tester likes it, the tray sits ready behind the pass. Continuous Deployment is when the dish goes straight from the pan to the Customer's Table (the prod target) because automated verify jobs already checked temperature, salt, and presentation — no human in the loop.
Analogy 3 — The Water Purification Plant (The Pipeline)
A CI/CD pipeline is like a Water Purification Plant. Raw water (raw code) enters at one end. It passes through Filters (lint), Chemical Treatment (unit tests), Sediment Tanks (integration tests in Cloud Build), UV Sterilisation (vulnerability scan in Artifact Registry), and finally a Tasting Panel (Cloud Deploy verify and approve). Only 100% pure water reaches the city's pipes (production GKE).
The Three Pillars: CI, CD, and CD
- Continuous Integration (CI): Automating the build and test process. Developers commit code frequently; the system builds it and runs unit tests immediately.
- Continuous Delivery (CD): Automating the path to a testing or staging environment. Code is always in a "deployable" state, but production deployment might still require a human "Push to Deploy" button.
- Continuous Deployment (CD): Automating the full path to production. If the code passes all tests in the pipeline, it is automatically pushed to real customers.
GCP CI/CD Toolstack
1. Cloud Build (The Engine)
A serverless platform that executes your build steps in Docker containers.
- Triggers: Automatically start builds on GitHub/GitLab commits.
- Security: Integrates with Secret Manager to handle API keys safely during the build.
2. Artifact Registry (The Warehouse)
The evolution of Container Registry. It stores Docker images, Maven packages, and npm modules.
- Scanning: Automatically scans for vulnerabilities (CVEs) upon upload.
3. Google Cloud Deploy (The Pilot)
A managed service that automates the delivery of your applications to GKE, Cloud Run, or Anthos.
- Release Management: Handles the promotion from
devtoprodwith built-in rollback capabilities.
Cloud Build Deep Dive — Triggers, Substitutions, Private Pools, Cache
Cloud Build is more than a YAML runner — it is the execution substrate for the entire pipeline. PCA scenarios usually hinge on choosing the right trigger type, plumbing substitutions for environment-specific values, picking between default pool and private pool, and squeezing build times with the right cache strategy.
Trigger types
- Push to branch — most common; filters by branch regex (
^main$,^release/.*$). - Push new tag — for release builds; pairs with semver tags.
- Pull request — runs presubmit checks; results post back as GitHub status checks.
- Manual — fires from
gcloud builds triggers run; good for hotfixes or scheduled rebuilds via Cloud Scheduler + Pub/Sub. - Webhook (inbound HTTPS) — accepts events from Jira, PagerDuty, or any system that can POST JSON.
Substitutions
Substitutions inject context into cloudbuild.yaml:
- Built-in:
$PROJECT_ID,$BUILD_ID,$COMMIT_SHA,$SHORT_SHA,$BRANCH_NAME,$TAG_NAME,$LOCATION. - User-defined: prefix with
_, e.g._REGION,_DEPLOY_TARGET. Override per-trigger or per-invocation. - Dynamic substitutions (allowed with
options.dynamic_substitutions: true) enable${_FOO:-default}style fallbacks.
Private pools
Default pools run in a Google-managed tenant project with no VPC reach. Private pools are required when build steps must talk to private GKE control planes, on-prem hosts over Cloud VPN/Interconnect, or Cloud SQL via private IP. Private pools also let you pin the machine type (e2-medium → e2-highcpu-32), worker count, and egress (NO_PUBLIC_EGRESS) for VPC-SC compliance.
Cache strategy
- Kaniko cache (
--cache=true --cache-ttl=6h) — layer cache in Artifact Registry; massive wins for Dockerfiles with largeapt-get installornpm cilayers. - Cloud Storage cache — manually
gsutil rsyncnode_modules,.m2,.gradlebefore/after build steps. - Buildpacks —
pack buildreuses base image layers automatically.
For PCA scenarios mentioning private GKE clusters, Cloud SQL private IP, or VPC Service Controls, the answer almost always involves a Cloud Build private pool attached to the relevant VPC peering — default pools cannot reach RFC1918 endpoints.
Cloud Deploy Deep Dive — Targets, Canary, Postdeploy, Verify
Google Cloud Deploy is the opinionated continuous delivery service. Instead of free-form scripts, you declare a DeliveryPipeline and a sequence of Target resources; Cloud Deploy enforces promotion order, rollback, approvals, and audit.
Targets
A Target points at one of: a GKE cluster (gke:), a Cloud Run service (run:), an Anthos cluster (anthosCluster:), or a multi-target (parallel fan-out across regions). Each target carries its own service account, execution config, and requireApproval flag.
Canary strategy
Defined under strategy.canary:
strategy:
canary:
runtimeConfig:
kubernetes:
serviceNetworking:
service: web
deployment: web
canaryDeployment:
percentages: [10, 25, 50]
verify: true
Cloud Deploy automatically creates phase resources, splits Service traffic via Gateway API or Istio, and pauses between phases. percentages: [10, 25, 50] means three intermediate stops before 100% — each can run verify jobs and postdeploy hooks.
Predeploy / Verify / Postdeploy
- Predeploy — runs before manifests are applied (e.g. database migration via Cloud Build job).
- Verify — runs after the phase reaches
Succeeded; integration tests, smoke tests, synthetic monitors. - Postdeploy — runs after the phase fully drains (e.g. update PagerDuty, post Slack message, run
bq queryto log deployment).
Rollback
A failed Rollout is automatically marked FAILED; one click (or gcloud deploy rollouts rollback) promotes the previous successful Release back into the target. Because every release is an immutable artifact in Artifact Registry, rollback is deterministic.
Cloud Deploy primitives, in order: DeliveryPipeline → Release → Rollout → Phase → Job. A Release is the immutable artifact set; a Rollout is one delivery of that release to one Target. Memorise this hierarchy — exam questions love to misuse "release" and "rollout" interchangeably.
Artifact Registry + Container Scanning
Artifact Registry is the regional successor to Container Registry. PCA exam questions test three angles: format support, regional placement, and integrated vulnerability scanning via Artifact Analysis.
Format support
- Docker / OCI — for GKE, Cloud Run, Cloud Build base images.
- Maven, npm, Python, Go, Apt, Yum — language and OS package repos in one product.
- Remote repositories — pull-through cache for Docker Hub, Maven Central, PyPI; avoids rate limits and lets VPC-SC clusters fetch external deps.
- Virtual repositories — single endpoint aggregating multiple upstream repos by priority.
Regional placement and replication
Repositories are regional or multi-regional. Pick the same region as the consuming GKE cluster to minimise egress and latency. For multi-region active-active, create a repo per region and use a CI matrix to push to all.
Container scanning
Artifact Analysis runs two scanners:
- OS scan — Debian/Ubuntu/Alpine/CentOS package CVEs.
- Language package scan — Maven, npm, Python, Go dependency CVEs.
Findings are emitted as Occurrences against Notes and integrate with:
- Cloud Build — fail the build if
CRITICALCVEs found (gcloud artifacts docker images list-vulnerabilities). - Binary Authorization — block images that lack a "vulnerability-free" attestation.
- Security Command Center — surface findings to SecOps.
Container Registry is deprecated; gcr.io endpoints redirect to Artifact Registry but automatic vulnerability scanning is not enabled by default on the new repo. You must explicitly enable Artifact Analysis (gcloud services enable containerscanning.googleapis.com) — exam wrong-answer traps assume it "just works".
Skaffold for Inner-Loop Development
Skaffold is the open-source CLI that powers Cloud Deploy's render and deploy phases — and it is also the inner-loop developer tool that closes the feedback gap between kubectl apply and git push.
skaffold dev workflow
skaffold dev watches local files, rebuilds the image with the fastest available builder (Docker / Kaniko / Buildpacks / Jib), pushes to a local registry or Artifact Registry, then kubectl applys the manifests against your minikube / GKE Autopilot dev cluster. Log streaming and port-forwarding happen automatically.
skaffold render and skaffold apply
Cloud Deploy uses skaffold render to materialise Kustomize overlays or Helm templates into a static set of manifests at Release creation time — that rendered output is the immutable artifact promoted across targets. skaffold apply (or kubectl) then deploys at each Rollout. This is why a Cloud Deploy Release is reproducible: the manifests were frozen in Artifact Registry as part of the Release.
Profiles for per-target config
profiles:
- name: prod
patches:
- op: replace
path: /build/artifacts/0/image
value: us-docker.pkg.dev/prod-proj/web/api
deploy:
kubectl:
manifests:
- k8s/*.yaml
Each Cloud Deploy Target references a Skaffold profile, so prod gets a different image repo, replica count, or resource quota without duplicating YAML.
Why this matters for PCA
Scenarios that mention "developers complain about slow feedback when iterating on microservices" or "we want to use the same tool locally and in CI" point to Skaffold as the unifying answer — not Tilt, not Garden, not raw kubectl.
Cloud Source Repositories vs GitHub via Cloud Build
PCA candidates routinely confuse the two — they are not mutually exclusive, and the right answer depends on compliance, mirroring, and trigger semantics.
Cloud Source Repositories (CSR)
- Fully managed private Git, hosted inside your GCP project.
- IAM-controlled (no separate user system) — auditable via Cloud Audit Logs.
- Can mirror from GitHub, Bitbucket, or GitLab — read-only replica updates on every upstream push, useful for VPC-SC environments where build workers cannot egress to github.com.
- Cloud Build trigger type:
cloudSourceRepoEvent.
GitHub (host repo) via Cloud Build GitHub App
- Most teams keep source on GitHub for PR culture, code review, branch protection.
- Cloud Build connects via the Cloud Build GitHub App (recommended) or webhook trigger.
- Trigger types: push, pull request, tag, manual.
- Status checks post back into the PR (
requiredfor branch protection).
When to choose which
- Air-gapped or VPC-SC project — mirror GitHub → CSR, build from CSR with a private pool. Avoids github.com egress.
- Standard SaaS workflow — connect GitHub directly; let Cloud Build GitHub App handle webhooks.
- Regulated industry, evidence trail — CSR's per-commit Cloud Audit Log entries are easier to surface in audits than GitHub API calls.
For "Our security team forbids outbound internet from build workers" scenarios, the chain is: GitHub → mirror to CSR → Cloud Build private pool → Artifact Registry → Cloud Deploy → private GKE. Every hop stays inside the VPC perimeter.
GitHub Actions OIDC to GCP
Many shops standardise on GitHub Actions for CI but still need to deploy to GCP. The modern, keyless pattern uses Workload Identity Federation with GitHub's OIDC token — no JSON service account keys to rotate or leak.
Setup steps
- Create a Workload Identity Pool and a Provider of type OIDC, with issuer
https://token.actions.githubusercontent.com. - Add attribute mapping (e.g.
attribute.repository = assertion.repository) and anattribute.conditionlikeassertion.repository == 'my-org/my-repo'. - Create a GCP service account, grant it deploy permissions (
roles/run.developer,roles/clouddeploy.releaser), and let the pool impersonate it viaroles/iam.workloadIdentityUser. - In the workflow, use
google-github-actions/auth@v2withworkload_identity_providerandservice_accountinputs.
Why OIDC beats SA keys
- No long-lived credentials — short-lived STS tokens (1 hour default).
- Repo-scoped — attribute conditions ensure only the right repo (and optionally branch / environment) can mint the token.
- Auditable — Cloud Audit Logs show the GitHub assertion subject (
repo:my-org/my-repo:ref:refs/heads/main).
Common pitfalls
- Forgetting
attribute.condition— leaves the provider open to any GitHub repo (huge blast radius). - Mismatching
audience— must match the provider's full resource name. - Not granting
roles/iam.workloadIdentityUseron the service account.
The PCA exam treats Workload Identity Federation + OIDC as the canonical answer for "external CI deploying to GCP without storing keys." If you see "GitHub Actions deploys to Cloud Run" plus "no service account keys," the chain is WIF Pool → OIDC Provider → impersonated SA → gcloud run deploy.
Workflows Orchestrating Multi-Cluster Deploys
When a single DeliveryPipeline is not enough — say, fan-out across 50 regional GKE clusters, or deploy GKE + Cloud Run + Spanner schema in sequence — Workflows is the GCP-native orchestrator.
Workflows fundamentals
Workflows runs YAML/JSON definitions; each step is an HTTP call or a connector to a GCP service. It supports parallel branches, retries, conditional logic, and waits for long-running operations.
Multi-cluster fan-out pattern
- parallel:
shared: [release_name]
for:
value: cluster
in: ${clusters}
steps:
- createRollout:
call: http.post
args:
url: ${"https://clouddeploy.googleapis.com/v1/" + cluster.pipeline + "/releases/" + release_name + "/rollouts"}
auth:
type: OAuth2
body:
targetId: ${cluster.target}
This kicks off one Cloud Deploy Rollout per regional cluster in parallel and waits for all of them via polling.
Cross-service orchestration
A typical "release weekend" workflow:
- Run
gcloud spanner databases ddl updatefor schema migration. - Create a Cloud Deploy Release for the API service (GKE).
- After Release reaches
prod, deploy the static frontend to Cloud Run. - Send a Slack notification via Cloud Functions.
- Tag the Git commit via the GitHub API.
When to use Workflows vs Cloud Deploy alone
Cloud Deploy handles promotion of one artifact across N environments. Workflows handles coordination of M artifacts and side effects that Cloud Deploy does not natively model. They compose: Workflows calls Cloud Deploy.
Build Approvers and Manual Approval Gates
Even fully automated pipelines need human gates for production deploys, GDPR-sensitive changes, or change-freeze windows. GCP provides approval primitives at both the Cloud Build and Cloud Deploy layers.
Cloud Build approvals
Set approvalConfig.approvalRequired: true on a trigger. When the trigger fires, the build enters PENDING_APPROVAL state. An approver with roles/cloudbuild.builds.approver (NOT the trigger creator) clicks Approve in the console or runs gcloud builds approve. Pub/Sub notifications fire on the cloud-builds-approvals topic so you can route to Slack or PagerDuty.
Cloud Deploy approvals
Set requireApproval: true on a Target. When a Release is promoted to that target, the Rollout sits in PENDING_APPROVAL until someone with roles/clouddeploy.approver approves via gcloud deploy rollouts approve. Critically, the approver cannot be the same identity that created the Release — separation of duties is enforced.
Best practices
- Require approval on prod target only — leave dev/staging fully automated.
- Use Google Groups for approver IAM bindings so on-call rotations work.
- Send approval notifications via Pub/Sub → Cloud Functions → Slack with a deep-link to the approval UI.
- Enable Binary Authorization alongside approvals — approvers attest the image, and the cluster enforces the attestation at admission time (defence in depth).
Auditing
Every approval is a Cloud Audit Log entry (google.devtools.cloudbuild.v1.CloudBuild.ApproveBuild / google.cloud.deploy.v1.CloudDeploy.ApproveRollout) with the approver's email, timestamp, and free-text comment. Pipe these into BigQuery for change-management reporting.
Cloud Build Webhook Triggers
Webhook triggers let any external system kick off a build — not just Git providers. They are the integration glue for Jira, ServiceNow, custom dashboards, or even Cloud Scheduler.
How it works
- Create a trigger of type
webhook. - Cloud Build generates a URL:
https://cloudbuild.googleapis.com/v1/projects/PROJECT/triggers/TRIGGER:webhook?key=API_KEY&secret=SECRET. - Store the
secretin Secret Manager and reference it in the trigger config; Cloud Build validates the incomingsecretquery parameter. - The external system POSTs JSON; the body is exposed in the trigger as
$(body.commit_sha)etc. via substitutions.
Use cases
- Cloud Scheduler nightly rebuilds — re-run the build to pick up new base image CVEs even when source has not changed.
- Jira ticket transition triggers deploy — Jira automation rule POSTs to the webhook when a ticket moves to
Ready for Prod. - Pub/Sub fan-out — Eventarc → Cloud Function → webhook to chain multiple builds.
- ChatOps — Slack slash command → webhook → build runs.
Security
- The
secretis rotated by updating the Secret Manager version; old versions remain valid until disabled. - Combine with API key restrictions (HTTP referrer or IP) and VPC-SC if the caller is internal.
- Log the inbound
User-Agentand source IP via Cloud Audit Logs and alert on anomalies.
Webhook triggers are the only Cloud Build trigger type that can be invoked without a Git event. If an exam scenario says "trigger a build from Cloud Scheduler" or "trigger a build from a custom internal portal," the answer is webhook trigger + Secret Manager, not Pub/Sub trigger (which does not exist for Cloud Build directly).
Deployment Strategies
| Strategy | Description | Pros | Cons |
|---|---|---|---|
| Recreate | Terminate V1, then start V2. | Simple, no version conflict. | Downtime. |
| Rolling Update | Replace instances one by one. | No downtime. | Two versions coexist for a while. |
| Blue-Green | Two identical environments; switch traffic. | Instant rollback, safe. | Double the cost during transition. |
| Canary | Send 5% of traffic to V2; gradually increase. | Minimal risk, early feedback. | Complex monitoring required. |
GitOps: Infrastructure as Code (IaC)
Modern CI/CD often follows the GitOps pattern.
- The Git Repository is the "Source of Truth" for both application code and infrastructure.
- Config Sync or Anthos Config Management watches the Git repo and automatically applies changes to your GKE clusters.
Architect's Choice: For highly regulated environments requiring strict audit trails, use GitOps. Every change to production is documented in a Git commit, and no one has "manual" access to the production environment. ::
FAQ — CI/CD Pipelines
Q1. Is Cloud Build only for Docker?
No. While it uses Docker containers to run the build steps, you can use it to build anything—Go binaries, Java JARs, Python packages, or even deploy Terraform configurations.
Q2. How do I handle secrets (like DB passwords) in a pipeline?
Never hardcode them in your cloudbuild.yaml. Store them in Secret Manager and reference them in the build step. Cloud Build can decrypt them at runtime using its service account.
Q3. What is a "Rollback" and how does GCP handle it?
A rollback is returning to the previous stable version after a failed deployment. Google Cloud Deploy makes this easy with a single button or command that re-points the environment to the previous "Release" artifact.
Q4. Can I use Jenkins on GCP?
Yes, you can run Jenkins on Compute Engine or GKE. However, for a "cloud-native" approach, Cloud Build is preferred because it is serverless (you don't manage the Jenkins server) and scales automatically.
Q5. How does "Binary Authorization" integrate with CI/CD?
The CI pipeline (Cloud Build) "signs" the image after it passes tests. The GKE cluster is configured to only allow images signed by that specific "Attestor" to be deployed.
Final Architect Tip
On the PCA exam, if you need Zero Downtime, choose Blue-Green or Rolling Update. If you need to Test with real users with minimal risk, choose Canary. For Multi-environment promotion (Dev -> Staging -> Prod), the answer is Google Cloud Deploy. Always ensure your pipeline includes Automated Testing and Security Scanning (Artifact Analysis).