examlab .net The most efficient path to the most valuable certifications.
In this note ≈ 20 min

Cloud-Native Application Design Patterns

3,850 words · ≈ 20 min read ·

Master Cloud Native design patterns for the Google Cloud PCD exam: 12-factor apps, stateless services, graceful shutdown, structured logging, health checks, retries, circuit breakers, distributed tracing, and progressive delivery.

Do 20 practice questions → Free · No signup · PCD

Introduction to Cloud Native Design Patterns

Cloud native is not the act of moving an existing application into Google Cloud. It is a discipline of designing applications so they thrive in distributed, elastic, multi-tenant infrastructure where individual instances are cheap, ephemeral, and replaceable. For the Google Cloud Professional Cloud Developer (PCD) exam, that translates into a concrete checklist: apply the twelve-factor methodology, externalise every piece of mutable state, handle SIGTERM correctly on Cloud Run and GKE, emit structured logs that Cloud Logging can parse, expose health endpoints that Kubernetes and Cloud Run probes consume, retry transient failures with exponential backoff and jitter, isolate failing dependencies with circuit breakers in Anthos Service Mesh, and roll out changes through blue-green or canary deployments rather than in-place restarts.

白話文解釋(Plain English Explanation)

Think of stateless services like hotel rooms

A boutique hotel has 50 identical rooms. When you check in, the receptionist hands you whichever room is clean. Your luggage, passport, and toothbrush stay with you, not in the room. If a pipe bursts in room 207, the hotel moves you to 312 and you barely notice. A stateless Cloud Run service works the same way. Each container instance is a clean room. The user's session, cart, and uploaded files live in Memorystore, Firestore, or Cloud Storage, never on the container's local disk. When Cloud Run kills instance 207 to autoscale down, the request reroutes to instance 312 and the user does not lose work.

Think of graceful shutdown like a barista closing the cafe

When a cafe closes at 9 p.m., the barista does not switch off the espresso machine mid-pour at 8:59. She stops accepting new orders, finishes the drinks already on the counter, washes the steam wand, and locks up. Cloud Run and GKE behave the same way during a scale-down or rolling update. They send SIGTERM to your container, which is the polite "we are closing soon" warning. Your code has roughly 10 seconds on Cloud Run (default) or terminationGracePeriodSeconds on GKE (default 30 seconds) to drain in-flight requests, flush logs, and close database connections before SIGKILL arrives.

Think of a circuit breaker like a household fuse box

When a hair dryer shorts out, the fuse trips and cuts power to that one outlet. The rest of the house keeps the lights on. Without the fuse, the whole panel would burn. A circuit breaker in Anthos Service Mesh is the software equivalent. If the recommendations service starts returning 503s on 50% of requests, the mesh stops sending traffic to it for 30 seconds and serves a cached fallback. The checkout service stays alive, the customer still pays, and your on-call engineer fixes recommendations without a site-wide outage. Without the breaker, every service waiting on recommendations would queue threads until the entire fleet ran out of memory.

The Twelve-Factor App Methodology on Google Cloud

The twelve-factor methodology, originally distilled at Heroku, is the canonical contract for cloud-native services and Google has its own architecture-centre adaptation for GCP. The PCD exam tests not the rote list but the GCP service that implements each factor.

Factors I, II, III: Codebase, Dependencies, Config

A single Git repository per microservice maps cleanly to a single Cloud Build trigger and a single Artifact Registry repo. Dependencies are pinned in requirements.txt, package-lock.json, or go.mod and baked into the container image; no pip install at runtime. Configuration is externalised: environment variables for non-secrets (set via --set-env-vars on Cloud Run or a ConfigMap on GKE), Secret Manager for credentials with version pinning (projects/PROJECT/secrets/db-password/versions/3), and never baked into the image. The reason matters: an image promoted from dev to prod must change zero bytes when crossing environments.

Factors IV, V, VI: Backing Services, Build/Release/Run, Processes

Backing services such as Cloud SQL, Pub/Sub, and Memorystore are attached resources reached only by URL or connection string. Swap the connection string and you swap providers. Build (Cloud Build), release (a tagged image plus config), and run (Cloud Run or GKE) are strictly separated, which is why Cloud Build produces an immutable digest like gcr.io/PROJECT/api@sha256:abc123 and the release stage references that digest, not a mutable tag like latest. Processes are stateless and share-nothing — see the next section.

Factors VII through XII: Port binding, concurrency, disposability, dev/prod parity, logs, admin processes

Your container exports a single HTTP port via the PORT env var that Cloud Run injects (default 8080). Concurrency is scaled horizontally by spinning up more instances, not by adding threads inside one. Disposability means starting in under 5 seconds (or you risk Cloud Run cold-start timeouts) and shutting down on SIGTERM. Dev/prod parity means the same image runs locally with gcloud run services proxy as in production. Logs are streams to stdout/stderr; Cloud Logging captures them automatically. Admin processes (database migrations) run as one-off Cloud Run Jobs or Kubernetes Jobs, not as ad-hoc SSH sessions.

On Cloud Run the PORT environment variable is injected by the platform — your code must read it, not hardcode 8080. Cloud Run Jobs use CLOUD_RUN_TASK_INDEX and CLOUD_RUN_TASK_COUNT to fan out work across parallel tasks. Failing to read these env vars is the single most common reason cloud-native ports of legacy apps misbehave on Cloud Run.

Reference: https://cloud.google.com/run/docs/container-contract

Stateless Service Design and Externalised State

Statelessness is the foundation that makes everything else (autoscaling, blue-green, self-healing) possible. The rule: an instance must produce identical output given identical input regardless of how many requests it has previously served. Concretely, that means no in-memory caches that the user depends on, no local-disk uploads that survive a restart, and no sticky sessions that pin a user to one instance.

Externalising session state

User sessions move to Memorystore for Redis or Firestore. Memorystore is the lower-latency choice (sub-millisecond reads, ~5 GB to 300 GB tiers) and Firestore is the choice when you also want per-user document storage. The HTTP cookie holds only an opaque session ID; the server fetches the session document from the external store on every request. This pattern lets Cloud Run scale from 0 to 1000 instances during a flash sale without losing a single cart.

Externalising file uploads

User-uploaded images, PDFs, and exports go to Cloud Storage. The container writes to a temp dir, uploads with the signed URL pattern, and discards. Never rely on /tmp surviving across requests on Cloud Run — /tmp is an in-memory tmpfs counted against your container memory limit and is destroyed when the instance recycles.

Externalising background jobs

Long-running work (PDF generation, video transcoding) must move out of the request thread and onto Pub/Sub or Cloud Tasks. The HTTP handler publishes a message and returns 202 immediately. A separate Cloud Run service or Cloud Run Job subscribes, processes, and writes the result somewhere durable. Cloud Run request timeouts cap at 60 minutes (raised from 15 in 2024), but anything over a few seconds belongs in async land.

For sticky workloads such as websockets or server-sent events, Cloud Run has a per-instance session affinity setting (--session-affinity) that routes a client to the same instance via a GCP_IAP_UID cookie for up to 30 days. Use it sparingly — it weakens the stateless guarantee and complicates blue-green cutovers. Prefer pushing state to Memorystore and keeping the service genuinely stateless.

Reference: https://cloud.google.com/run/docs/configuring/session-affinity

Externalised Configuration with Secret Manager and ConfigMaps

The cloud-native rule is config is environment, not code. Two GCP services dominate.

Secret Manager for credentials

Secret Manager stores TLS keys, database passwords, third-party API tokens, and OAuth client secrets with built-in versioning (versions/1, versions/2, ..., versions/latest). You reference a specific version in Cloud Run with --set-secrets=DB_PASS=db-password:3 so a rollback is a config flip, not a re-deploy. IAM grants are per-secret with the roles/secretmanager.secretAccessor role; the Cloud Run service account is the identity that pulls the value at container start.

Environment variables and Kubernetes ConfigMaps for non-secrets

Plain config (feature flags, log levels, region names) goes in env vars on Cloud Run or in a ConfigMap mounted as env or as a file on GKE. The advantage of a file mount: hot reload. Mount application.yaml from a ConfigMap, watch the inode with inotify, and the service picks up new values without a restart. The trade-off: you must build that reload logic; static env vars require a pod restart to change.

Runtime config patterns

For dynamic toggles (kill switches, A/B experiment flags) Firestore is the de-facto choice. A document under config/feature_flags is read on each request with a 60-second client-side cache. Updates propagate within the cache TTL and you keep an audit trail in Firestore's per-document version history.

Configuration that varies between deploys (staging vs. production, customer A vs. customer B) is stored in the environment, never in the codebase. On Google Cloud this means env vars + Secret Manager + ConfigMaps, never config.production.yaml checked into git.

Reference: https://cloud.google.com/architecture/twelve-factor-app-development-on-gcp

Graceful Shutdown and SIGTERM Handling

When Cloud Run scales down, when GKE rolls a Deployment, or when a node is drained for maintenance, the platform sends SIGTERM to PID 1 of your container. Your code must catch it, stop accepting new work, drain in-flight requests, flush buffered telemetry, close connection pools, and exit cleanly. If you do not, the platform sends SIGKILL after the grace period and any in-flight request is dropped with a 5xx.

Cloud Run timing

Cloud Run sends SIGTERM and waits 10 seconds by default before SIGKILL. The setting --container-command-timeout-seconds cannot extend that grace period; you must keep shutdown under 10 s or the platform terminates you. Practical pattern:

sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGTERM)
go func() {
    <-sig
    server.Shutdown(context.Background())  // stops new accepts, drains in-flight
    db.Close()
    logger.Flush()
}()

GKE timing

A pod gets terminationGracePeriodSeconds (default 30, configurable up to your need) between SIGTERM and SIGKILL. The kubelet also removes the pod from Service endpoints simultaneously, but kube-proxy iptables updates take a few seconds to propagate, so requests can still arrive after SIGTERM. The mitigation: put a preStop hook that sleeps 5–10 s before your app starts shutting down, giving iptables time to converge.

Common bugs

A library that forks a goroutine to flush logs every 5 s but blocks SIGTERM until the next tick. A database driver that holds a 30 s idle-timeout connection and refuses to close. A worker that pulls a Pub/Sub message and acks it before finishing processing — on SIGTERM the work is lost because the message is gone. The fix for the last one is the modAck pattern: extend the ack deadline while processing, ack only after success.

Health Checks: Liveness, Readiness, and Startup Probes

Probes are how the platform decides whether your container is healthy enough to receive traffic, sick enough to be restarted, or still warming up. Get them wrong and you either kill working pods or send traffic to a half-initialised one.

Liveness probe

"Is the process alive enough to keep, or should you kill it?" Returns a 200 if the event loop is responsive. Should not check downstream dependencies — if Cloud SQL is slow, killing the API server pod makes the outage worse. A simple /healthz that returns {"status":"ok"} is correct.

Readiness probe

"Can this pod handle a request right now?" Should check that critical dependencies are reachable — the database connection pool is up, the configuration is loaded, the warmup cache is primed. If readiness fails, the pod is removed from Service endpoints but not restarted. The endpoint is conventionally /ready or /readyz.

Startup probe

"Has the app finished starting?" Runs once, gates liveness and readiness from running until startup completes. Critical for slow-starting JVM apps or anything that loads a 2 GB ML model. Without a startup probe, the liveness probe may fail during the 90-second JVM warmup and kill the pod in a restart loop.

Cloud Run health checks

Cloud Run autodetects health: if your container responds to a TCP connect on PORT, it is considered healthy. There is no liveness/readiness distinction; the platform infers it. For more control, Cloud Run for Anthos and GKE both support full Kubernetes probe semantics.

Liveness failure = restart the pod. Readiness failure = remove from load balancer but keep running. Startup failure = restart, but only after failureThreshold * periodSeconds of failed attempts. The exam frequently asks which probe to use when a slow ML model startup is killing pods — the answer is startup probe, not increase liveness timeout.

Reference: https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/

Structured Logging and Correlation IDs

Plain-text logs cost you on Cloud Logging — they cannot be filtered, alerted on, or joined to traces. Structured logging (JSON to stdout) is the cloud-native default.

The JSON contract

Cloud Logging on GCP recognises a specific JSON envelope written to stdout:

{
  "severity": "ERROR",
  "message": "checkout failed",
  "logging.googleapis.com/trace": "projects/my-proj/traces/abc123",
  "logging.googleapis.com/spanId": "def456",
  "user_id": "u-987",
  "order_id": "o-654"
}

The severity field maps to Cloud Logging severity levels (DEBUG, INFO, WARNING, ERROR, CRITICAL). The logging.googleapis.com/trace field auto-links the log entry to the trace in Cloud Trace — clicking the log opens the corresponding trace timeline.

Correlation IDs

Every inbound request should carry an X-Correlation-Id or X-Request-Id header. If absent, generate a UUID. Propagate it to all downstream calls (Pub/Sub message attributes, gRPC metadata, BigQuery query labels). Every log line emits it. When the SRE team gets a customer ticket with "request abc123 failed at 14:32", they grep one ID across every service and reconstruct the journey.

Avoid log spam

Cloud Logging charges $0.50 per GB ingested (after the 50 GB monthly free tier). A console.log(req.body) on a 1000 RPS endpoint with a 2 KB body costs about $86/day. Use sampled debug logs (if Math.random() < 0.001) for high-volume noise and reserve INFO/ERROR for things that matter operationally.

Idempotency and Retry with Exponential Backoff

Networks fail. Pods get evicted. APIs return 503 under load. Cloud-native code expects this and retries — but only when the operation is safe to retry, which means idempotent.

Idempotency keys

A POST /charge that bills the customer must accept an Idempotency-Key header (any UUID the client generated). The server stores the (key, response) pair in Firestore with a 24-hour TTL and returns the cached response on a duplicate. Stripe, PayPal, and the Google Cloud Tasks API all use this pattern; learn it for the exam and for production.

Exponential backoff and jitter

The Google API client libraries retry transient errors (429, 503, 502, UNAVAILABLE) with an exponential schedule: 1 s, 2 s, 4 s, 8 s, 16 s, capped at 60 s, with up to 6 attempts. Adding jitter (random ±25% on each delay) is critical to prevent the thundering herd — without jitter, every client retries at exactly the same moment after a brief outage and re-overwhelms the recovering service.

Pub/Sub redelivery

Pub/Sub redelivers a message if you do not ack within the ack deadline (default 10 s, max 600 s, or extend with modifyAckDeadline). Set dead-letter topics to send a message to a separate topic after, say, 5 failed attempts so it stops cycling forever and an on-call engineer can inspect it.

Treating every error as retryable. A 400 Bad Request is not transient — retrying it just wastes quota and money. The Google API libraries retry only on the explicit list of transient codes (HTTP 408, 429, 500, 502, 503, 504; gRPC UNAVAILABLE, RESOURCE_EXHAUSTED, DEADLINE_EXCEEDED). If you write a custom retry wrapper, copy this list — do not catch the bare Exception class and retry forever.

Reference: https://cloud.google.com/storage/docs/retry-strategy

Circuit Breakers with Anthos Service Mesh

When a downstream dependency starts failing slowly (200 ms response time becomes 30 s), retries make it worse. The pool of in-flight requests grows, threads block, memory fills with queued contexts, and the calling service crashes. The cure is a circuit breaker that stops sending traffic when the failure rate crosses a threshold.

DestinationRule outlierDetection

Anthos Service Mesh (and Istio underneath) configures circuit breaking via DestinationRule. The outlierDetection block ejects an endpoint from the load-balancing pool when it returns too many 5xx responses:

trafficPolicy:
  outlierDetection:
    consecutive5xxErrors: 5
    interval: 30s
    baseEjectionTime: 30s
    maxEjectionPercent: 50

A pod that returns five consecutive 5xx is ejected for 30 s. After ejection it is re-added; if it fails again the ejection time doubles. At most 50% of pods can be ejected simultaneously so the service is not completely starved.

Connection pool limits

The other half of circuit breaking is connectionPool limits: max connections, max pending HTTP requests, max retries per call. When the limit is hit, additional calls fail fast with a 503 instead of queueing. The caller can serve a cached fallback or degraded response immediately.

Fallback strategies

The pattern: try the live service, on 503 serve from Memorystore cache, on cache miss serve a static fallback (a generic recommendation list, an empty result set, a "we are experiencing high load" page). The user experience degrades gracefully instead of throwing white pages.

Distributed Tracing and Context Propagation

A single user click might traverse the API gateway, the auth service, the orders service, the inventory service, and three Pub/Sub topics. Without a trace ID, debugging "checkout is slow" is guesswork. Distributed tracing assigns a trace ID at the edge and propagates it across every hop.

W3C Trace Context headers

The cloud-native standard is the W3C Trace Context spec: a traceparent header carries version-traceId-spanId-flags. Cloud Trace and OpenTelemetry both speak this format; the older X-Cloud-Trace-Context is still supported but new code should use traceparent.

OpenTelemetry SDK

Add the OpenTelemetry SDK (Java, Go, Python, Node — all officially supported) and an exporter pointed at cloudtrace.googleapis.com. The SDK auto-instruments most HTTP and gRPC clients so traces appear with zero application code changes. For Pub/Sub, manually inject the traceparent into message attributes so the subscriber span chains correctly.

Sampling and cost

100% trace sampling on a 10 K RPS service generates 864 M spans/day. At Cloud Trace pricing of $0.20 per million spans after the free tier, that is $172/day per service. Default to 10% probabilistic sampling, override to 100% for error traces (x-cloud-trace-context flag bit), and use the saved budget on detailed instrumentation rather than volume.

Trace context must be propagated through every hop including async ones. A Pub/Sub publisher that does not inject traceparent into message attributes creates a broken chain: the trace ends at the publish and a new disconnected trace starts at the subscriber. The result on the Cloud Trace UI is a useless half-tree. Always inject the context into Pub/Sub attributes, Cloud Tasks headers, and any custom messaging fabric.

Reference: https://cloud.google.com/trace/docs/setup

Message-Driven Asynchronous Architecture

Synchronous chains are brittle: if any service in the chain is down, the whole request fails. Message-driven async decouples producers from consumers so each can scale, fail, and recover independently.

Pub/Sub for fan-out

A user.signed_up event published once is delivered to N subscribers: welcome-email, analytics-warehouse, fraud-screening, crm-sync. Each subscription is independent — if analytics-warehouse is down, the email still sends. Add a new subscriber tomorrow and you do not touch the producer.

Cloud Tasks for delayed and rate-limited work

Cloud Tasks differs from Pub/Sub by being a queue with explicit per-task scheduling. You enqueue "send reminder email to user X in 24 hours" and Cloud Tasks delivers exactly once at that time. Rate limiting (dispatch at 50 QPS, max 100 concurrent) is a queue setting, not application code.

Eventarc for trigger glue

Eventarc converts Cloud Storage object writes, Cloud Audit Logs, and Firestore document updates into Pub/Sub messages or direct Cloud Run invocations. It is the "when X happens, run Y" layer that removes polling code from your services.

Blue-Green vs Canary Deployments

In-place restarts (delete old, start new) cause downtime and have no safe rollback. Cloud-native deploys are progressive.

Blue-green

Two complete environments. Blue is production. Green is the new version, fully deployed but receiving zero traffic. After smoke tests, flip the load balancer to green in a single config change. Rollback = flip back. On Cloud Run, every deploy creates a new revision; gcloud run services update-traffic --to-revisions=NEW=100 is the cutover. Cost: 2x infrastructure during the swap window.

Canary

The new version gets a small slice of traffic (5%, 25%, 50%, 100%) over a controlled rollout. Cloud Run supports this natively with weighted revisions: --to-revisions=v2=10,v1=90 sends 10% to v2. Anthos Service Mesh handles canary on GKE through VirtualService weight rules with the same percentages. Pair canary with SLO-based auto-promotion: if error rate stays below 0.1% for 10 minutes at 10%, advance to 50%, otherwise auto-rollback.

Rolling update on GKE

The default GKE Deployment strategy. maxSurge: 25% and maxUnavailable: 25% mean Kubernetes creates new pods 25% over capacity, kills old ones as new ones become ready. Cheaper than blue-green but rollback is slower (you must roll the rollout) and you cannot run smoke tests against the full new fleet before the switchover.

Feature flags

Independent of deployment strategy, feature flags decouple release from deploy. Deploy the code dark, flip the flag in Firestore to enable for 1% of users, monitor, expand. A bug needs only a flag flip to mitigate — no rollback required.

Frequently Asked Questions (FAQs)

Q1: Which 12-factor principle does Cloud Run enforce most strictly?

Cloud Run enforces stateless processes (factor VI) and port binding (factor VII) at the platform level. Your container must export PORT and any state written to local disk is destroyed when the instance is reused or recycled. Disposability (factor IX) is also enforced via the 10-second SIGTERM deadline. Other factors (config, logs, build/release/run) are convention, not enforcement.

Q2: When should I use Memorystore vs Firestore for session state?

Use Memorystore for Redis when you need sub-millisecond reads, simple key/value access, and TTL-based expiry — typical for HTTP session caches and rate-limit counters. Use Firestore when sessions are richer documents (cart contents, draft forms), when you also need to query across sessions, or when you need multi-region replication out of the box. Memorystore is regional only.

Q3: How do I handle SIGTERM correctly on Cloud Run when my requests take 30 seconds?

Cloud Run sends SIGTERM 10 seconds before forced kill. If your individual requests take 30 s, you need to (1) stop accepting new requests immediately on SIGTERM, (2) wait for in-flight requests to finish — Cloud Run will extend the 10 s if active connections are draining, up to the request timeout, and (3) flush logs and close pools before returning. For genuinely long workloads, move them to Cloud Run Jobs or Pub/Sub-driven workers where the runtime is decoupled from the HTTP cycle.

Q4: What is the difference between a liveness probe and a readiness probe?

A liveness probe failure causes Kubernetes to restart the pod. Use it only for "the process is wedged and must die" — a deadlocked event loop, an OOM that did not crash. A readiness probe failure causes Kubernetes to remove the pod from Service endpoints without restarting. Use it for "this pod cannot serve traffic right now" — warming caches, missing DB connection, dependency degraded. Confusing them causes either restart storms or traffic going to half-broken pods.

Q5: How do I implement a circuit breaker without Anthos Service Mesh?

If you cannot adopt a service mesh, implement the pattern in code with a library: resilience4j for Java, gobreaker for Go, opossum for Node, circuitbreaker for Python. Configure failure threshold (e.g., 50% errors over 10 calls), open duration (e.g., 30 s), and a half-open probe (one trial call to test if downstream recovered). The trade-off vs Anthos Service Mesh: code-level circuit breakers are per-language and harder to keep consistent across a polyglot fleet; the mesh enforces uniform behaviour from the data plane.

Q6: Why is exponential backoff with jitter better than fixed retry intervals?

Fixed intervals cause thundering herd: when a downstream recovers, every client that was retrying at the same cadence hits it simultaneously and re-knocks it over. Exponential backoff (1 s, 2 s, 4 s, 8 s) spreads retries across an expanding window. Jitter (random ±25%) further desynchronises clients so even with identical backoff schedules they do not align. The Google API client libraries implement this out of the box; rolling your own without jitter is a documented PCD exam trap.

Q7: Should I use blue-green or canary for a new microservice deployment?

Canary for steady-state changes (new feature, config tweak, dependency upgrade) where you want gradual exposure and automatic rollback based on SLOs. Blue-green for changes that cannot be partially rolled out (schema migrations, breaking API changes, full version cutover) where you need an atomic switch and a clean rollback path. On Cloud Run both are one-command operations; the choice is about risk profile, not implementation difficulty.

Official sources

More PCD topics