Traffic Management — GCP PCNE Study Notes

Introduction

In Google Cloud, Traffic Management refers to the ability to intelligently route, shape, mirror, retry, and protect HTTP/HTTPS and TCP traffic across regions, services, and revisions. It goes well beyond round-robin load distribution: with the Application Load Balancer (formerly Global External HTTP(S) LB), Internal Application Load Balancer, Cloud Run revisions, and Anthos Service Mesh (ASM), platform engineers can implement canary releases, request mirroring for shadow traffic, automatic retries on 5xx, sticky sessions for stateful workloads, geo-targeting via Cloud DNS routing policies, and fault-isolation through Envoy circuit breakers. The PCNE exam expects you to know which capability lives on which load balancer tier, which API field controls it, and what happens when health checks, IAP, and Cloud Armor sit in front of these rules.

白話文解釋（Plain English Explanation）

1. URL Map 像高鐵的轉乘月台

A URL map with path matchers is like a high-speed rail interchange. The forwarding rule is the ticket gate that validates IP and port; the target HTTPS proxy is the ticket inspector that strips your SSL ticket; the URL map then reads the destination on your ticket (the host header and path) and routes you to the correct platform (backend service). /api/* goes to the API platform, /static/* goes to the CDN platform — without the URL map you'd have to build a separate station (forwarding rule + IP) for every service.

2. Weighted Backends Are Like a Slow-Pour Coffee Tasting

Weighted backend services behave like a barista who lets you taste a new espresso blend at 5% strength first. The URL map sends 95% of the cups (requests) to the established roast (backend-v1) and 5% to the experimental roast (backend-v2). If customers love the new roast, you nudge the dial to 25%, then 50%, then 100%. The barista (Envoy data plane) never has to recompile the menu — you just change the weight integer in the route action and traffic shifts in seconds.

3. Traffic Mirroring Is Like a Dress Rehearsal

Request mirroring (requestMirrorPolicy) is a dress rehearsal where the understudy performs the same scene on a second stage but the audience never sees it. Production traffic still flows to backend-prod, but a copy of every request is forked to backend-shadow (the new version). The shadow response is discarded, so users are unaffected, yet you collect real-world latency, error rate, and SQL plan metrics. It's the safest way to test a rewrite under production load before flipping any user-visible weight.

URL Map: A Google Cloud Load Balancing resource that maps incoming request URLs (host + path) to backend services or backend buckets using host rules and path matchers. Required for any Application Load Balancer; Network Load Balancers (passthrough L4) do not use URL maps.

URL Maps and Path Matchers Deep Dive

Anatomy of a URL Map

A URL map contains a defaultService (catch-all backend), zero or more host rules (each mapping a list of hostnames to a named path matcher), and path matchers (each containing pathRules and/or routeRules). When a request arrives, the global Application Load Balancer evaluates host rules first, then walks the path matcher's rules in order of longest prefix or explicit priority.

gcloud compute url-maps create web-map \
  --default-service=backend-default
gcloud compute url-maps add-path-matcher web-map \
  --path-matcher-name=api-matcher \
  --default-service=backend-default \
  --path-rules='/api/*=backend-api,/static/*=backend-static' \
  --new-hosts='www.example.com'

pathRules vs routeRules

pathRules are the simple form (path glob → backend). routeRules are the advanced form supporting header matching, query parameter matching, weighted destinations, URL rewrites, header transformations, retry policies, fault injection, and traffic mirroring. Only routeRules support the full advanced traffic management feature set on the global external Application Load Balancer.

Wildcard and Prefix Matching

Path matchers use Envoy-compatible matching: prefixMatch, fullPathMatch, and regexMatch. The longest matching prefix wins, with explicit priority breaking ties. regexMatch uses RE2 syntax — JavaScript-style lookahead is not supported.

Always set a safe defaultService on every path matcher. Requests that match the host rule but fail every pathRule fall through to this default. Pointing it to a static 404 backend prevents accidental exposure of the wrong service when a path pattern typo slips through code review.

Header-Based and Method-Based Routing

`headerMatches` in routeRules

The Application Load Balancer can dispatch traffic on any request header. A common pattern is routing mobile clients to a slim backend:

routeRules:
- priority: 10
  matchRules:
  - prefixMatch: /
    headerMatches:
    - headerName: User-Agent
      regexMatch: ".*Mobile.*"
  routeAction:
    weightedBackendServices:
    - backendService: projects/p/global/backendServices/mobile-be
      weight: 100

Supported header operators include exactMatch, prefixMatch, suffixMatch, regexMatch, presentMatch, rangeMatch, and invertMatch. You can also match on cookies via the :authority, :method, and :path pseudo-headers.

Method, Query Param, and Pseudo-Header Matching

methodMatch constrains a rule to GET, POST, etc. — useful when you want only POST /checkout to route to a write-heavy backend pool. queryParameterMatches lets you route on ?canary=true for opt-in beta testing, often paired with feature flags so QA can exercise the new backend by appending a single query string.

Header Transformations

Inside routeAction.headerAction (or pathMatcher.defaultRouteAction.headerAction) you can set requestHeadersToAdd, requestHeadersToRemove, responseHeadersToAdd, and responseHeadersToRemove. This is how you stamp X-Cloud-Trace-Context propagation hints or strip an internal X-Debug-Token before responses leave Google's edge.

Header-based routing is enforced inside Google's Envoy fleet at the load balancer tier — not on the backend. That means even if the backend ignores the header, you can still segment traffic. Pair this with Cloud Armor edge security policies so that a forged header from a hostile client cannot bypass the routing logic (e.g., always strip client-supplied X-Internal-Routing headers in requestHeadersToRemove).

Weighted Backend Services and Canary Releases

How `weightedBackendServices` Works

A single routeRule may point to multiple backend services, each with an integer weight (0–1000). The load balancer normalises weights into proportions: a 95/5 split is configured as weight: 95 and weight: 5. Weights are applied per-request, not per-connection, so HTTP/2 multiplexing still respects the split.

gcloud compute url-maps import web-map --source=web-map.yaml

defaultRouteAction:
  weightedBackendServices:
  - backendService: .../backendServices/checkout-v1
    weight: 95
  - backendService: .../backendServices/checkout-v2
    weight: 5

Progressive Delivery Pipeline

A typical canary pipeline: bake the new revision into checkout-v2, deploy with weight: 0, run smoke tests via headerMatches (only requests with X-Canary: true reach v2), then raise to 1%, 5%, 25%, 50%, 100% across stages, watching SLO error budget burn in Cloud Monitoring. Roll back is a single gcloud compute url-maps import away — no re-deploy, no DNS TTL wait.

Where Weighted Routing Is and Isn't Supported

Weighted backend services work on the global external Application Load Balancer, regional external Application Load Balancer, and internal Application Load Balancer (regional, cross-region). They are not available on classic Network Load Balancers (passthrough L4), the legacy global HTTPS LB without advanced traffic management enabled, or the protocol forwarding rules of the Internal TCP/UDP LB.

Candidates often pick "Network Load Balancer" for a canary scenario because it is mentioned alongside high performance. Wrong — passthrough Network Load Balancers route on 5-tuple only and have no URL map and no weight field. Canary at the L4 tier means weighted DNS or weighted forwarding rule IPs, not weighted backends.

Traffic Splitting on Cloud Run

Revisions and Tagged URLs

Cloud Run keeps every deployment as an immutable revision. The gcloud run services update-traffic command shifts the percentage between revisions without touching the container or rebuilding the image.

gcloud run services update-traffic checkout \
  --to-revisions=checkout-00042-abc=90,checkout-00043-def=10 \
  --region=asia-east1

`--tag` for Pre-Production Probing

Each revision can receive a tag, which mints a tag-specific URL like https://canary---checkout-xyz.a.run.app. Tagged URLs receive zero production traffic by default, so QA can hit the new revision directly while real users still see the stable revision. Combine with --no-traffic to deploy without exposing the new revision at all.

Gradual Rollouts and Rollbacks

Cloud Run supports gradual rollout for new revisions when traffic mode is LATEST, but explicit per-revision splits are deterministic and what you should pick for exam scenarios that mention "5% canary then 25%". Rollback is gcloud run services update-traffic checkout --to-revisions=checkout-00042-abc=100 — instant and atomic.

For Cloud Run behind an external Application Load Balancer with a Serverless NEG, you have two independent traffic-splitting layers: the URL map's weightedBackendServices and the Cloud Run revision split. Pick one layer to manage canary, otherwise compound percentages (95% × 90% = 85.5%) become hard to reason about during an incident.

Traffic Mirroring (`requestMirrorPolicy`)

What Mirroring Does

A routeRule may include a requestMirrorPolicy that names a secondary backend service. Every matching request is sent to both the primary backend (response returned to the user) and the mirror backend (response discarded). Latency on the mirror does not affect the user; failures on the mirror are not retried; the mirror sees an identical request body and headers, with X-Forwarded-For preserved.

Use Cases

Shadow-testing a rewrite: send 100% of production traffic to both payments-v1 (live) and payments-v2 (mirror), compare metrics, then flip with weighted routing.
Schema migration validation: mirror writes to a new Cloud SQL instance to compare query plans without affecting users.
Security scanning: mirror to a backend that runs deep packet inspection or a WAF in detect-only mode.

Limitations

Mirrored traffic counts toward the mirror backend's capacity and quota — you must size it to handle the full duplicated request volume. The mirror endpoint should be idempotent because POST/PUT/DELETE will execute twice. Mirroring is available on the global external Application Load Balancer and the regional Application Load Balancers via routeAction.requestMirrorPolicy.

Mirroring sends real production data to the shadow backend, including PII in request bodies. If the mirror lives in a different VPC, project, or region, ensure equivalent DLP scanning, VPC Service Controls perimeter membership, and IAM least privilege apply, otherwise mirroring becomes a data-exfiltration vector.

Retry Policies and Timeouts

`retryPolicy` on routeAction

Configure automatic client-side retries inside the Envoy data plane:

routeAction:
  retryPolicy:
    retryConditions:
    - 5xx
    - gateway-error
    - connect-failure
    - retriable-4xx
    numRetries: 3
    perTryTimeout: 5s

Valid retryConditions include 5xx, gateway-error, connect-failure, retriable-4xx (only 409), refused-stream, cancelled, deadline-exceeded, internal, resource-exhausted, and unavailable. The default numRetries is 1 if you set a retryPolicy without specifying it.

Backend Service Timeout

The --timeout flag on gcloud compute backend-services (default 30 seconds for the global external Application Load Balancer) controls how long the load balancer waits for a backend response. WebSocket and long-poll backends should raise this to up to 86400 seconds (24 hours) using --timeout=86400.

Connection Draining

--connection-draining-timeout controls how long an in-flight request keeps draining when a backend is removed (default 0 seconds, max 3600 seconds). Set this to your p99 request duration plus a buffer when rolling MIG instances so users see no 502 on deploys.

Idle Timeouts and HTTP Keepalive

The global external Application Load Balancer holds HTTP keepalive connections for 610 seconds by default; backend keepalive is controlled by --connection-draining-timeout and the backend service's HTTP/2 settings. Mismatched keepalive between LB and backend is a top source of intermittent 502 responses.

Default Application Load Balancer timeouts — backend service response timeout 30s, HTTP keepalive (frontend) 610s, connection draining 0s, retry numRetries when policy present but unset 1. Memorize these four numbers: PCNE loves to ask which knob to turn when a WebSocket disconnects every five minutes (answer: backend service --timeout raised to 86400s).

Circuit Breakers via Anthos Service Mesh

Why Circuit Breakers

The Application Load Balancer's backend service supports basic capacity controls (maxConnections, maxPendingRequests, maxRequestsPerConnection, maxRetries) via circuitBreakers in the backend service resource. But for outlier detection, sticky ejection of misbehaving pods, and mesh-wide policies, you graduate to Anthos Service Mesh (ASM) — Google's managed Istio.

Istio `DestinationRule` for Connection Pool and Outlier Detection

apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: checkout-circuit
spec:
  host: checkout.prod.svc.cluster.local
  trafficPolicy:
    connectionPool:
      tcp: { maxConnections: 100 }
      http: { http1MaxPendingRequests: 100, maxRequestsPerConnection: 10 }
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

A pod that returns five consecutive 5xx responses within a 30-second window gets ejected from the load-balancing pool for 30 seconds. ASM caps ejection at 50% of healthy endpoints, so a flapping service can't take itself entirely offline.

Backend Service `circuitBreakers` (LB-level)

At the load balancer tier you set the same primitives via gcloud compute backend-services update:

gcloud compute backend-services update checkout-be \
  --global \
  --max-connections=1000 \
  --max-pending-requests=100 \
  --max-requests-per-connection=10

Use LB-level circuit breakers for north-south (internet → service) and ASM for east-west (service → service) traffic.

Outlier detection in ASM is per-Envoy-sidecar local view, not globally synchronised. A pod may be ejected from one client's pool but still receive traffic from others, which is intentional — it prevents synchronised flapping. Don't expect every client to eject the same pod at the same millisecond; pair outlier detection with health-check-driven Kubernetes liveness probes for hard removal.

Session Affinity Options

Affinity Modes by Load Balancer Type

The backend service sessionAffinity field accepts:

NONE — pure load balancing (default).
CLIENT_IP — hash on client IP (L4 LBs and L7 LBs).
CLIENT_IP_PROTO — IP + protocol (L4 LBs only).
CLIENT_IP_PORT_PROTO — 5-tuple (L4 internal passthrough only).
GENERATED_COOKIE — LB issues an GCLB cookie (HTTP(S) LBs).
HEADER_FIELD — hash on a named request header (HTTP(S) LBs with advanced traffic management).
HTTP_COOKIE — hash on a customer-supplied cookie name (HTTP(S) LBs with advanced traffic management).
STRONG_COOKIE_AFFINITY — HTTP cookie affinity that survives backend changes (specific LBs).

`affinityCookieTtlSec`

For GENERATED_COOKIE and HTTP_COOKIE, the TTL defaults to 0 (session cookie, cleared on browser close). Set affinityCookieTtlSec to pin users for a fixed window — useful when a stateful cart lives in backend memory.

Affinity vs Balancing Mode Tension

Affinity is best-effort: if a backend becomes unhealthy or capacity-saturated based on its balancingMode (UTILIZATION, RATE, or CONNECTION), the LB will pick a new backend and the session breaks. For true durability, externalise state to Memorystore Redis or Cloud Spanner and keep affinity as a latency optimisation only.

Geo-Targeting via Cloud DNS Routing Policies

Geo Routing Policy

Cloud DNS routing policies let a single DNS name return different IP addresses depending on the querying resolver's geographic region. Define a policy with per-region rrdataSets:

gcloud dns record-sets create www.example.com. --zone=prod-zone \
  --type=A --ttl=60 \
  --routing-policy-type=GEO \
  --routing-policy-data="asia-east1=10.1.0.10;europe-west1=10.2.0.10;us-central1=10.3.0.10"

A resolver in Taiwan receives 10.1.0.10; one in Frankfurt receives 10.2.0.10. This complements anycast Application Load Balancer IPs by giving you explicit regional control for compliance or latency-pinning scenarios.

WRR (Weighted Round Robin) Policy

For DNS-level canary across regional ALBs:

--routing-policy-type=WRR \
--routing-policy-data="90=34.117.x.x;10=34.149.y.y"

Failover Policy

A primary regional VIP with backup VIPs that activate only when Cloud DNS health checks fail. Note: DNS-based failover honours TTL (use 30–60 seconds), so global anycast Application Load Balancer remains preferable for sub-second failover.

Combine Cloud DNS Geo policy with regional Application Load Balancers to satisfy data residency rules (EU traffic terminates in europe-west1, APAC in asia-east1) without sacrificing the URL map's advanced traffic management. The global ALB's single anycast IP can't enforce regional data residency on its own.

Putting It Together: A Production Topology

A realistic PCNE-scale topology layers these features:

Cloud DNS Geo policy routes EU users to a europe-west1 regional external Application Load Balancer and APAC users to asia-east1.
The URL map on each regional ALB has a path matcher: /api/* → API backend (Cloud Run via Serverless NEG), /static/* → backend bucket (Cloud Storage + Cloud CDN), default → frontend MIG.
The API path matcher uses routeRules with headerMatches on X-Canary: true to send 100% of opt-in traffic to api-v2, and weightedBackendServices 95/5 for the rest.
requestMirrorPolicy mirrors 100% of /api/checkout to api-shadow for production-grade testing.
retryPolicy retries 5xx and gateway-error three times with a 5-second per-try timeout.
Backend service circuitBreakers cap at 1000 connections; ASM DestinationRule handles east-west outlier ejection between microservices.
sessionAffinity: GENERATED_COOKIE keeps a stateful shopping cart pinned with affinityCookieTtlSec: 3600.

Exam Tips & Traps

Trap: "URL maps are used with Network Load Balancers." False. URL maps are used with Application Load Balancers (L7). Passthrough Network Load Balancers (L4) have no URL map.
Tip: For canary, look for weighted backend services or Cloud Run revision splits with update-traffic.
Tip: For shadow-testing without user impact, look for requestMirrorPolicy.
Tip: For sub-second regional failover, prefer global anycast Application Load Balancer, not Cloud DNS failover policies.
Tip: For automatic retry on transient 5xx, configure retryPolicy.retryConditions with 5xx and gateway-error.
Tip: For mesh-internal circuit breaking, the answer is ASM DestinationRule.outlierDetection, not the LB backend service circuitBreakers.

FAQs

Q: What is the difference between a Host Rule and a Path Rule inside a URL map? A: A host rule routes based on the Host header (e.g., blog.example.com vs shop.example.com) and selects a path matcher. A path rule lives inside a path matcher and routes based on the URL path. Both are required: hostRule → pathMatcher → pathRule/routeRule → backendService.

Q: Can I split traffic between Cloud Run and a GKE backend in the same URL map? A: Yes. Use a weightedBackendServices routeRule pointing to a Serverless NEG (Cloud Run) and a zonal NEG or Instance Group (GKE). The Application Load Balancer treats them as homogeneous backend services for weighting purposes.

Q: Does requestMirrorPolicy double my backend cost? A: For the mirror service, yes — mirrored requests consume CPU, memory, and downstream calls (Cloud SQL, BigQuery, etc.). Size the mirror backend at 100% of primary capacity and assume idempotent execution.

Q: How is HTTP_COOKIE affinity different from GENERATED_COOKIE? A: GENERATED_COOKIE is minted and managed by Google's load balancer (GCLB cookie). HTTP_COOKIE uses a customer-supplied cookie name and path so your application can manage the cookie lifecycle and propagate it across non-LB paths (e.g., between a CDN and an origin).

Q: When should I use Anthos Service Mesh circuit breakers instead of backend service circuitBreakers? A: Use ASM when traffic is east-west (service-to-service inside the mesh), when you need outlier detection (eject unhealthy endpoints based on consecutive 5xx), and when you want mesh-wide policy managed via DestinationRule CRDs. Use backend service circuitBreakers for north-south ingress capacity caps at the LB tier.

Q: Do retry policies retry POST requests? A: Yes — but Envoy retries any method whose retryConditions match. POST retries are dangerous for non-idempotent operations (double-charging a credit card). Either make the endpoint idempotent (idempotency keys) or scope retryPolicy to only safe methods via separate routeRules with methodMatch.

Q: How granular are weights — can I do 0.5%? A: Weights are integers 0–1000 and Envoy normalises them to a sum. A 0.5% split requires weight: 5 and weight: 995. For sub-0.1% you'd need DNS-level WRR or per-region scaling.

Introduction

白話文解釋（Plain English Explanation）

1. URL Map 像高鐵的轉乘月台

2. Weighted Backends Are Like a Slow-Pour Coffee Tasting

3. Traffic Mirroring Is Like a Dress Rehearsal

URL Maps and Path Matchers Deep Dive

Anatomy of a URL Map

pathRules vs routeRules

Wildcard and Prefix Matching

Header-Based and Method-Based Routing

headerMatches in routeRules

Method, Query Param, and Pseudo-Header Matching

Header Transformations

Weighted Backend Services and Canary Releases

How weightedBackendServices Works

Progressive Delivery Pipeline

Where Weighted Routing Is and Isn't Supported

Traffic Splitting on Cloud Run

Revisions and Tagged URLs

--tag for Pre-Production Probing

Gradual Rollouts and Rollbacks

Traffic Mirroring (requestMirrorPolicy)

What Mirroring Does

Use Cases

Limitations

Retry Policies and Timeouts

retryPolicy on routeAction

Backend Service Timeout

Connection Draining

Idle Timeouts and HTTP Keepalive

Circuit Breakers via Anthos Service Mesh

Why Circuit Breakers

Istio DestinationRule for Connection Pool and Outlier Detection

Backend Service circuitBreakers (LB-level)

Session Affinity Options

Affinity Modes by Load Balancer Type

affinityCookieTtlSec

Affinity vs Balancing Mode Tension

Geo-Targeting via Cloud DNS Routing Policies

Geo Routing Policy

WRR (Weighted Round Robin) Policy

Failover Policy

Putting It Together: A Production Topology

Exam Tips & Traps

FAQs

Official sources

More PCNE topics

`headerMatches` in routeRules

How `weightedBackendServices` Works

`--tag` for Pre-Production Probing

Traffic Mirroring (`requestMirrorPolicy`)

`retryPolicy` on routeAction

Istio `DestinationRule` for Connection Pool and Outlier Detection

Backend Service `circuitBreakers` (LB-level)

`affinityCookieTtlSec`