examlab .net The most efficient path to the most valuable certifications.
In this note ≈ 31 min

Compute Selection and Design

6,100 words · ≈ 31 min read ·

Professional Cloud Architect deep dive into GCP compute options: VMs vs Containers vs Serverless, GKE architecture, Cloud Run, and specialized hardware (GPU/TPU).

Do 20 practice questions → Free · No signup · PCA

The Compute Spectrum: From IaaS to Serverless

Choosing the right compute service is one of the most frequent tasks for a Professional Cloud Architect. Google Cloud offers a spectrum of compute options, each balancing control, scalability, and operational overhead differently.

  • Infrastructure as a Service (IaaS): Compute Engine (VMs). Maximum control, maximum management.
  • Container as a Service (CaaS): Google Kubernetes Engine (GKE). Balanced control and scalability.
  • Serverless Containers: Cloud Run. High abstraction, auto-scales to zero.
  • Platform as a Service (PaaS): App Engine. Fully managed for web and API workloads.
  • Functions as a Service (FaaS): Cloud Functions. Event-driven, small code snippets.

For the GCP PCA exam, the "Optimal" choice is usually the one that minimizes operational overhead (SaaS/Serverless) unless there are specific technical requirements (e.g., custom kernels, legacy OS, specialized hardware) that force you lower in the stack.

The amount of time and effort required by a team to manage, patch, and maintain the underlying infrastructure. "Serverless" has the lowest overhead. Reference: https://cloud.google.com/architecture/framework/operational-excellence


Plain-Language Explanation: Compute Selection Strategy

Choosing a compute service is like deciding how to get your dinner.

Analogy 1 — The Master Chef's Kitchen (Compute Engine)

Compute Engine is like building your own kitchen from scratch. You buy the stove, you choose the knives, and you are responsible for cleaning the floor and fixing the oven if it breaks. You have total control (you can cook anything), but you spend a lot of time on maintenance. Use this if you have a very specific, rare recipe (Legacy software) that won't work anywhere else.

Analogy 2 — The Food Court (GKE)

GKE is like running a stand in a busy food court. The mall (Google) provides the electricity, the water, and the seating area (The Control Plane). You focus on your specific menu (Containers). You can scale up by opening more stands if the line gets long. It's powerful and flexible, but you still need to manage your staff and your specific equipment (Node pools).

Analogy 3 — The Vending Machine (Cloud Run / Serverless)

Cloud Run is like a high-tech vending machine. You don't care how the machine works or who cleans it. You just put your product (The container image) inside. If one person wants a snack, it serves one. If a thousand people show up at once, the machine magically duplicates itself instantly. If nobody is there, it costs nothing. It's the "Optimal" choice for most modern apps because it requires zero maintenance.

On the PCA exam, if a question mentions "Zero Operational Overhead" or "Scale to Zero," your first thought should be Cloud Run. Reference: https://cloud.google.com/run/docs/overview/what-is-cloud-run


Deep Dive: Compute Engine (VMs)

When to choose Compute Engine:

  • Legacy Applications: Software that requires a specific OS version or kernel.
  • Custom Hardware: Need for specific local SSDs or complex networking (e.g., multiple NICs).
  • Sole-tenant Nodes: Physical isolation for compliance or licensing.
  • Bare Metal: For workloads like Oracle that cannot be virtualized.

Cost Optimization Features

  • Spot VMs: Up to 91% discount for fault-tolerant batch jobs.
  • Custom Machine Types: Create a VM with the exact amount of CPU and RAM you need—no more "wasted" resources.
  • Committed Use Discounts (CUDs): For stable, long-term workloads.

Deep Dive: Google Kubernetes Engine (GKE)

GKE is the "Optimal" choice for complex, microservices-based architectures.

  • Standard vs. Autopilot:
    • Standard: You manage the node pools and scaling.
    • Autopilot: Google manages the nodes, security, and scaling. This is the WAF-aligned choice for 2025/2026.
  • Regional Clusters: For high availability across zones.
  • Workload Identity: The secure way to give pods access to GCP services.

Deep Dive: Cloud Run (Serverless Containers)

Cloud Run has become the "go-to" compute service for most architects.

  • Developer Productivity: "Write code, get URL."
  • Concurrency: A single container instance can handle many requests simultaneously.
  • Event-driven: Triggered by HTTP, Pub/Sub, or Cloud Storage events via Eventarc.

Specialized Compute: GPU and TPU

For AI/ML and High-Performance Computing (HPC).

  • GPUs (NVIDIA): General-purpose acceleration for graphics, video, and ML training. Available on Compute Engine and GKE.
  • TPUs (Tensor Processing Units): Google's custom-built ASICs optimized specifically for large-scale machine learning (TensorFlow, JAX, PyTorch). Use these for massive LLM training.

For the PCA exam, GPUs are for general acceleration, while TPUs are for "Maximum performance and scale in ML training." Reference: https://cloud.google.com/tpu/docs/intro-to-tpu


GCE Machine Type Families (E2, N2, C3, M3)

Compute Engine offers distinct machine families optimized for different cost/performance profiles. The PCA exam expects you to map workload patterns to the right family rather than defaulting to n2-standard-4 for everything.

General-purpose families

  • E2 (cost-optimized): Shared-core to 32 vCPU, runs on Intel/AMD/Ampere transparently. Best for low-traffic web servers, dev/test, small databases. Up to 31% cheaper than N1 but no committed-resource guarantees and no GPUs.
  • N2 / N2D: Balanced price/performance on Intel Cascade/Ice Lake (N2) or AMD EPYC Milan (N2D). Default choice for production web tiers, in-memory caches, mid-size databases.
  • N4 (Titanium): Newest general-purpose tier built on Google's custom Titanium offload. Lower per-vCPU price than N2 with Dynamic Resource Management.

Compute-optimized

  • C3 / C3D: 4th-gen Xeon (Sapphire Rapids) / 4th-gen AMD EPYC Genoa. Targets high-performance web servers, gaming servers, ad-serving, and HPC frontends. Pairs with Titanium SSD for sub-millisecond local NVMe.
  • C2 / C2D: Older compute-optimized; still valid for licensed software pinned to specific CPU generations.

Memory-optimized

  • M3 / M2 / M1: Up to 12 TB RAM. Mandatory for SAP HANA certified deployments, large in-memory analytics, and Redis clusters that exceed 624 GB.
  • M3 adds DDR5 and is the current SAP-certified default for new deployments.

Accelerator-optimized

  • A3 / A2: NVIDIA H100 / A100 GPUs attached at high bandwidth via NVLink and GPUDirect-TCPX. Used for LLM training and large-scale inference.
  • G2: NVIDIA L4 for inference, video transcoding, and graphics workstations.

PCA mapping cheat sheet: SAP HANA → M3. LLM training → A3 (H100). Inference / transcoding → G2 (L4). HPC / gaming server → C3. Default web/API → N2 or N4. Bursty dev/test → E2. Reference: https://cloud.google.com/compute/docs/machine-resource


Cloud Run vs Cloud Functions vs GKE: Decision Tree

The three "modern container/code" services overlap enough to confuse candidates. Use this decision tree exactly:

  1. Is the unit of work a single function triggered by an event (Pub/Sub message, Cloud Storage object, Firestore write, HTTPS call) with no need to bring your own runtime?

    • Yes → Cloud Functions (2nd gen). Note that 2nd-gen Functions are actually built on Cloud Run + Eventarc, so the boundary is blurred.
    • No → continue.
  2. Do you need Kubernetes-specific features — service mesh (Anthos Service Mesh / Istio), DaemonSets, StatefulSets with persistent volumes, complex network policies, multi-cluster ingress, or custom CRDs/operators?

    • Yes → GKE Autopilot (managed nodes) or GKE Standard (when you need control over node images, GPUs with specific drivers, or sole-tenant placement).
    • No → continue.
  3. Is the workload a containerized HTTP, gRPC, or WebSocket service that should auto-scale (including to zero) and is fine with the 60-minute request timeout?

    • Yes → Cloud Run services.
  4. Is it a finite, run-to-completion containerized task (data migration, ML batch inference, nightly ETL, video transcode batch)?

    • Yes → Cloud Run jobs (see next section) or Batch API.

Hard constraints that disqualify Cloud Run

  • Request duration > 60 minutes → use GKE or Compute Engine with Managed Instance Groups.
  • Long-lived persistent connections (e.g., chat backplane) that exceed idle limits → GKE.
  • Need for hostNetwork, privileged containers, or custom kernel modules → GKE Standard (Autopilot blocks these).

Cloud Run Jobs vs Cloud Run Services

A common PCA distractor is conflating "Cloud Run" with HTTP services only. Since 2023, Cloud Run jobs are a first-class resource for batch-style containers.

Cloud Run services

  • Triggered by HTTP/gRPC requests or Eventarc.
  • Always-on URL, scales 0 → N based on concurrency.
  • Request-scoped CPU billing (or CPU always-allocated for background work).

Cloud Run jobs

  • Triggered by gcloud run jobs execute, Cloud Scheduler, Workflows, or Eventarc.
  • No HTTP listener required — your container runs to completion and exits.
  • Supports task parallelism (up to 10,000 tasks per execution, each with its own CLOUD_RUN_TASK_INDEX).
  • Max task timeout: 24 hours (vs 60 minutes for services).

When to choose jobs over services

Workload Pick
Nightly ETL container Cloud Run jobs + Cloud Scheduler
ML batch inference fan-out (1,000 shards) Cloud Run jobs with --tasks 1000 --parallelism 50
Customer-facing REST API Cloud Run services
Database migration script Cloud Run jobs (single task)
Long video transcode (> 1 hour) Cloud Run jobs (≤ 24 h) or Batch API

For exam scenarios involving scheduled containers, fan-out batch tasks, or ad-hoc operational scripts, Cloud Run jobs are now the "Optimal" answer over building a custom GKE CronJob or spinning up a Compute Engine VM. Reference: https://cloud.google.com/run/docs/create-jobs


Spot VMs Plus Batch API for Cost-Optimized Compute

Combining Spot VMs with the Batch API is one of the highest-leverage cost patterns on GCP for embarrassingly parallel workloads (rendering farms, Monte Carlo simulations, genomics pipelines).

Spot VM mechanics

  • 60–91% discount versus on-demand.
  • No 24-hour limit (unlike legacy Preemptible).
  • 30-second graceful shutdown signal (ACPI G2 Soft Off) before reclamation.
  • No SLA; design with checkpointing.

Batch API

Batch is GCP's managed batch scheduler (GA 2023). It accepts a JSON job spec, provisions Compute Engine or GKE-backed runners (including Spot), executes your container or script, and tears everything down.

# job.yaml — render farm shard
taskGroups:
  - taskSpec:
      runnables:
        - container:
            imageUri: gcr.io/my-proj/renderer:v3
            commands: ["--frame", "${BATCH_TASK_INDEX}"]
      computeResource:
        cpuMilli: 4000
        memoryMib: 16384
    taskCount: 5000
    parallelism: 200
allocationPolicy:
  instances:
    - policy:
        machineType: c3-standard-4
        provisioningModel: SPOT

Why this beats DIY MIG + Spot

  • Batch handles task queue, retries on preemption, and result aggregation for you.
  • Native integration with Cloud Logging and Cloud Storage for inputs/outputs.
  • Mix-and-match with GPU Spot (e.g., A100 80GB Spot for cheap ML training) without writing scheduler glue.

Cloud Run Direct VPC Egress

A frequently-tested 2024/2025 addition. Older Cloud Run services that needed to reach private resources (Cloud SQL via private IP, Memorystore, on-prem via Interconnect) had to route through a Serverless VPC Access connector — an extra managed component with its own throughput limits and per-hour cost.

Direct VPC egress removes the connector entirely:

  • Cloud Run injects ENIs directly into your VPC subnet.
  • Higher throughput (up to 1 Gbps per instance, vs ~200 Mbps per connector).
  • Lower latency — no extra NAT hop.
  • Subnet IP planning matters: each instance consumes one IP, so right-size the subnet for max concurrency × instance count.

When you still need the connector

  • Cross-region egress to a VPC in a region Direct VPC doesn't yet support.
  • Cloud Run jobs in regions where Direct VPC for jobs is still rolling out — verify the matrix at launch time.

Architect implication

For the PCA exam, the "Optimal" pattern for a Cloud Run service that talks to a private Cloud SQL is now Direct VPC egress + Private Service Connect, not the connector. The connector is "Viable" but adds an operational unit you no longer need.


GPU vs TPU Selection: When to Pick Which

Beyond the basic GPU/TPU split, the exam tests nuance:

Pick a GPU (A3 H100, A2 A100, G2 L4) when

  • Framework is PyTorch with custom CUDA kernels that aren't ported to XLA.
  • Model is inference-heavy with strict latency SLOs (L4 on G2 is the cost/perf sweet spot).
  • Mixed graphics + ML workload (3D rendering, video AI pipelines).
  • You need NVIDIA-only libraries (TensorRT, Triton Inference Server, RAPIDS).

Pick a TPU (v5e, v5p, Trillium/v6) when

  • Training large transformer models in JAX, TensorFlow, or PyTorch/XLA.
  • You need pod-scale interconnect (TPU v5p pods reach 8,960 chips with 4.8 TB/s ICI bandwidth).
  • Cost-per-FLOP matters for sustained training campaigns.
  • TPU v5e specifically targets cost-efficient inference of LLMs under ~70B params.

Hybrid: Vertex AI managed training

If the question says "minimize operational overhead while training a large model," the answer is often Vertex AI custom training jobs that abstract both GPU and TPU procurement — you only pick the accelerator type.

A common exam distractor is "the team uses PyTorch and wants the cheapest training option" with TPU offered as the answer. PyTorch on TPU requires the XLA backend and code changes; if the question implies "no code changes," stick with GPU (A2 or A3). TPU is only optimal when the team is willing to use JAX or PyTorch/XLA. Reference: https://cloud.google.com/tpu/docs/run-calculation-pytorch


Confidential Computing on GCP

Confidential VMs encrypt data in use via hardware memory encryption (AMD SEV / SEV-SNP, Intel TDX). This matters for regulated workloads (finance, health, defense contractors) where the threat model includes "Google operators with hypervisor access."

Service surface

  • Confidential VM (Compute Engine): N2D, C2D, C3D AMD SEV; C3 Intel TDX. Enable via --confidential-compute --maintenance-policy=TERMINATE.
  • Confidential GKE Nodes: Whole node pool runs on Confidential VMs; pod memory protected without code changes.
  • Confidential Space: Hardened, attested VM environment for multi-party data clean rooms — two parties pool data, neither (nor Google) can see the other's raw rows; only the attested workload output is released.
  • Confidential Cloud Run / Functions: Roadmap items at time of writing; for now, sensitive in-use compute belongs on Confidential GKE Nodes.

Trade-offs

  • ~2–6 % performance overhead from memory encryption.
  • Live migration disabled — VMs reboot during host maintenance.
  • Not all machine families are eligible (no E2, no M3 yet for SEV-SNP).

PCA framing

If a scenario mentions "data must be encrypted in memory", "protection from cloud provider insider threat", or "multi-party computation across organizations", the optimal answer is Confidential VMs / Confidential GKE Nodes / Confidential Space, not just CMEK (which is data at rest) or VPC-SC (which is API perimeter).


Sole-Tenant Nodes for Licensing and Compliance

A Sole-Tenant Node is a physical Compute Engine host dedicated to a single customer project — no other tenants share the hardware.

Why architects choose it

  • Bring-Your-Own-License (BYOL): Oracle Database, Windows Server Datacenter, SQL Server Enterprise per-core licenses tied to physical sockets.
  • Compliance: PCI, HIPAA BAA edge cases that require provable physical isolation beyond the standard shared-tenancy attestation.
  • Affinity controls: Place specific VMs on specific nodes using node affinity labels (compute.googleapis.com/node-name).
  • Predictable performance: No noisy-neighbor variance on memory bandwidth or cache.

Configuration model

  1. Reserve a node template (machine family, e.g., n2-node-80-640).
  2. Create a node group in a zone, optionally with autoscaling.
  3. Launch VMs with --node-group=<group> or affinity labels.

Cost reality

Sole-tenant pricing equals the full node cost plus a sole-tenancy premium, regardless of how many VMs you pack onto it. The architecture only pays off when:

  • License savings exceed the premium, OR
  • Compliance mandates it, OR
  • You pack the node densely (high VM-to-node ratio).

vs Bare Metal Solution

Don't confuse Sole-Tenant Nodes with Bare Metal Solution (BMS). BMS is a separate offering for workloads that cannot be virtualized at all (typically Oracle RAC, Exadata-equivalent). Sole-Tenant Nodes still run on the GCE hypervisor — they just don't share the box.


Summary of Optimal vs. Viable Decisions in Compute

Requirement Viable Solution (Good) Optimal Solution (Architect-level)
New Web API VM with Auto-scaling Cloud Run (Serverless)
Legacy SQL Server Compute Engine VM Cloud SQL (Managed) or VM with PD
Microservices Multiple VMs GKE Autopilot
Batch Processing Standard VMs Spot VMs + Batch API
Compliance Isolation Separate Project Sole-tenant Nodes

FAQ — Compute Selection Strategy

Q1. Why choose GKE over Cloud Run?

Choose GKE if you need complex networking (like service meshes), specialized hardware (GPUs), or if your workload has long-running processes that exceed Cloud Run's time limits.

Q2. What is the difference between Preemptible and Spot VMs?

Spot VMs are the successor to Preemptible VMs. They have the same low price but no fixed 24-hour limit, meaning they can stay running as long as capacity is available.

Q3. Can I run Windows on Cloud Run?

No. Cloud Run is Linux-based. For Windows-specific applications, you must use Compute Engine or App Engine Flexible.

Q4. What is the "Cold Start" problem?

In serverless (Cloud Functions/Run), the first request after a period of inactivity may be slow because GCP has to spin up a new container instance. You can mitigate this using Min Instances.

Q5. When should I use App Engine?

App Engine is excellent for "opinionated" web apps where you want Google to handle everything. However, Cloud Run has largely superseded it because of its flexibility with container images.


Final Architect Tip

"Managed over Manual." In the PCA exam, if a managed service (Cloud Run, GKE Autopilot) can do the job, it is almost always the "Optimal" choice over an IaaS solution (Compute Engine). Only go down the stack if you have a specific, documented constraint.

Official sources

More PCA topics