Cloud Run for Containerized Apps — GCP PCD Study Notes

Q: Q3: How does Cloud Run handle secrets, and how do I rotate them?

A3: Cloud Run integrates natively with Secret Manager via --set-secrets , mounting either as env vars or as files. Using :latest resolves the version at revision-create time, so rotation requires a new deploy. For automatic rotation, mount in file mode and pin to latest so the volume re-resolves, or trigger a re-deploy from a Cloud Build pipeline when Secret Manager fires a secretVersionCreated event through Eventarc.

Q: Q7: How do I run a Cloud Run service with no public access?

A7: Set --no-allow-unauthenticated on deploy (and remove allUsers from roles/run.invoker ), then set --ingress=internal-and-cloud-load-balancing so the *.run.app URL is unreachable from the public internet. Access then requires either an authenticated request through roles/run.invoker or routing through an internal/external load balancer with IAP attached.

Introduction to Cloud Run

Cloud Run is Google Cloud's fully managed serverless container platform. You hand it an OCI image and a port number, and it gives you back a regional HTTPS endpoint that scales from zero to thousands of instances based on incoming traffic. The PCD exam treats Cloud Run as the default compute choice for stateless web services, APIs, and event handlers — so understanding the container contract, the scaling levers, and the integration surface (VPC, Cloud SQL, Secret Manager, Eventarc) is non-negotiable.

This guide walks the full surface area: Cloud Run services vs Cloud Run jobs, the HTTP contract, concurrency tuning, CPU allocation modes, generation 2 execution, startup CPU boost, sidecars, Direct VPC egress vs the legacy Serverless VPC Access connector, Cloud SQL connectivity, Secret Manager mounts, traffic splitting, custom domains via the global external Application Load Balancer, IAM-driven invocation, OIDC service-to-service auth, Eventarc triggers, and Cloud Tasks for asynchronous fan-out.

白話文解釋（Plain English Explanation）

Three analogies make Cloud Run's behavior intuitive before you hit the API surface.

The On-Demand Food Truck Analogy

Traditional servers are like leasing a brick-and-mortar restaurant with twenty staff on payroll regardless of dinner traffic. Cloud Run is a fleet of food trucks that show up only when someone is hungry. The first customer waits a moment for the truck to drive in (cold start). Once parked, that truck can serve up to eighty diners in parallel from the same kitchen (concurrency). When the crowd dies down at 2 a.m., every truck drives home and you stop paying — exactly the scale-to-zero behavior of a Cloud Run service with min-instances=0. If you need a truck idling at the corner so the first morning customer is served instantly, you raise min-instances to one and accept the idle cost.

The Hotel Concierge vs Hotel Cleaning Crew Analogy

A Cloud Run service is a hotel concierge: it sits at the desk waiting for guest requests (HTTP traffic), answers each one quickly, and returns to waiting. A Cloud Run job is the cleaning crew that gets dispatched at 11 a.m. to flip every checked-out room — no inbound requests, just a finite list of tasks executed in parallel by ephemeral workers, then everyone clocks out. Both run the same container engine, but one is request-driven and one is task-driven. Picking the wrong one (running a nightly batch as a service that polls a queue) is the most common Cloud Run anti-pattern on the exam.

The Express Lane Analogy

CPU allocation modes are like grocery store checkout lanes. CPU allocated only during requests is the express lane: you only get a clerk when you reach the register, so you pay nothing while you wait but the clerk vanishes between customers. That's perfect for sync HTTP services. CPU always allocated keeps a clerk parked at your lane even between customers — needed when your container has background tasks (a sidecar streaming logs, a goroutine flushing buffers, or an SDK that runs cleanup off the request thread). It costs more per second, but lets background work actually execute. Pick the wrong mode and your background scheduler silently stops the moment the HTTP request returns.

The Cloud Run Container Contract

Cloud Run does not run any binary you give it. The container must obey a specific contract or it will fail to start, time out, or get killed mid-request.

Listening on `$PORT`

The container must start an HTTP server listening on 0.0.0.0:$PORT within the container start timeout (default 240 seconds, configurable up to ten minutes). $PORT is an environment variable injected at runtime; do not hardcode 8080 even though that is the default — Cloud Run can vary it. Express, Flask, Gin, Spring Boot, and the standard FaaS frameworks all read PORT automatically; custom servers need explicit handling.

Statelessness and Ephemeral Filesystem

Each instance gets an in-memory tmpfs writable filesystem rooted at /. Anything written there counts against the instance's memory limit and disappears when the instance is recycled. Persistent state must live in Cloud Storage, Cloud SQL, Firestore, Spanner, or another external service. Cloud Run does mount Cloud Storage buckets as volumes (FUSE) and supports Network File System mounts on second-generation execution, but neither replaces a proper datastore.

Request Lifecycle and Timeouts

The maximum request timeout is sixty minutes per request (raised from five minutes in 2023). Beyond that, the load balancer closes the connection and Cloud Run terminates the instance handler. Long-running workloads belong in Cloud Run jobs or in a streaming consumer pattern with chunked responses.

The SIGTERM Shutdown Signal

When Cloud Run scales an instance down, it sends SIGTERM and waits up to ten seconds for graceful shutdown before sending SIGKILL. Production services should trap SIGTERM to drain in-flight requests, flush logs, and close database pools.

An immutable snapshot of a service's container image plus all configuration (env vars, secrets, CPU, memory, concurrency). Each gcloud run deploy produces a new revision. Traffic is independently routable across revisions, which is the primitive that makes blue/green and canary releases trivial. See Cloud Run revisions.

Services vs Jobs

Cloud Run exposes two execution models. Choosing correctly is high-yield exam material.

Cloud Run Services

Services respond to HTTP, gRPC, or WebSocket traffic. They autoscale based on request volume and concurrency, can scale to zero, and live behind a stable *.run.app URL. Use services for REST APIs, gRPC backends, webhooks, server-rendered web apps, and Eventarc consumers.

Cloud Run Jobs

Jobs are batch workloads with no inbound traffic. You define --tasks (the number of parallel task instances) and --task-timeout, then trigger execution via gcloud run jobs execute, Cloud Scheduler, Workflows, or Eventarc. Each task receives CLOUD_RUN_TASK_INDEX (0-based) and CLOUD_RUN_TASK_COUNT, enabling embarrassingly parallel work like processing N shards of a dataset. Jobs do not scale to zero in the same sense — they exist only during execution.

Picking Between Them

A web API that handles 200 requests per second is a service. A nightly invoice rendering pipeline that converts 50,000 PDFs is a job (specifically, a job with --tasks=100 so fifty tasks run in parallel). A long-running queue consumer is borderline: if it should always run, use a service with min-instances=1 and CPU always allocated; if it should run once per hour against a backlog, use a job triggered by Cloud Scheduler.

Concurrency and Scaling Levers

Cloud Run's pricing and latency profile is dictated by four knobs.

Per-Instance Concurrency

By default, one Cloud Run service instance accepts up to eighty concurrent requests, configurable from 1 to 1,000. Set concurrency to 1 for CPU-bound workloads that cannot share an instance (e.g., a video encoder); raise it past 80 for I/O-bound workloads where each request mostly waits on downstream services. Higher concurrency means fewer instances and lower cost, but raises the blast radius if one bad request crashes the process.

Min Instances

--min-instances keeps that many warm copies running at all times, eliminating cold starts for the first request after idle. Each min-instance costs the full request-not-active CPU rate, so it is not free. Production tier services often run with min-instances=1 or higher; dev services stay at zero.

Max Instances

--max-instances caps horizontal scale, defaulting to 100 and configurable up to 1,000 per region for the second-generation environment (with project-level quota increases possible). The cap protects downstream systems like Cloud SQL from connection floods. Always pair it with a sensible concurrency setting.

Request CPU Allocation vs Always Allocated

The CPU allocation mode toggles whether the CPU is throttled to near zero between requests. Default is CPU allocated only during request processing, which is cheaper and correct for pure request/response workloads. CPU always allocated is required when you need background work to continue after the request returns, when you use a sidecar that consumes CPU off the request path, or when you want to keep WebSocket connections fully responsive.

Cloud Run service instances default to 80 concurrent requests per instance and 100 max instances per service. Both defaults are conservative for I/O-bound APIs; tune --concurrency and --max-instances based on load tests, not vibes. See Cloud Run concurrency.

Execution Environment Generations

Cloud Run offers two execution environments with materially different behavior.

Generation 1

The original execution environment. Lower cold start latency for some workloads, but lacks several capabilities: no Network File System mounts, no full Linux syscall surface (notably, no ptrace and limited unshare), no UDP egress, and weaker compatibility with low-level network code. Default for older deployments; not recommended for new services that need any of the above features.

Generation 2

The Linux-syscall-compatible environment built on gVisor sandboxing's successor. Supports full Linux semantics, NFS mounts, Cloud Storage FUSE mounts, larger filesystem capacities, and UDP traffic. Slight cold start increase versus gen1 for some images, but the compatibility gain is decisive for most workloads. Select via --execution-environment=gen2 on deploy.

Startup CPU Boost

A separately toggled feature (--cpu-boost) that doubles the available CPU during instance startup, shaving cold-start time for CPU-bound bootstraps like JVM warmup or large dependency tree initialization. Costs apply only during the boost window. Combining --cpu-boost with min-instances >= 1 gets the best of both worlds for tier-one services.

Sidecars and Multi-Container Services

Cloud Run supports multi-container services where a single revision runs one ingress container plus up to nine sidecar containers in the same instance, sharing localhost networking and an emptyDir-style volume.

When to Use Sidecars

Common sidecar patterns: an OpenTelemetry Collector that scrapes metrics and forwards to Cloud Monitoring, a Cloud SQL Auth Proxy that handles IAM-authenticated database connections, an Nginx or Envoy in front of a slim app container, or a log-shipping agent. The sidecar runs the same lifetime as the ingress container and gets its own resource limits.

Sidecar Configuration

Sidecars are declared in the service YAML under spec.template.spec.containers[]. Only one container per revision can carry --port (the ingress container). Resource requests sum across all containers and count against the instance's overall CPU and memory budgets. Plan accordingly — a 1 vCPU instance shared by an app, an Envoy, and an OTel Collector will starve the app.

Networking: VPC Egress Options

By default, Cloud Run instances reach the internet through Google-managed NAT. To reach private resources inside a VPC (a private Cloud SQL instance, an internal load balancer, a self-hosted service on GCE), you need one of two egress paths.

Direct VPC Egress

The recommended modern option. The Cloud Run service is attached directly to a subnet in your VPC; outbound traffic enters the VPC without going through an intermediate connector. Lower latency, lower cost, scales automatically with traffic. Available on second-generation execution. Configure via --network and --subnet flags on deploy.

Serverless VPC Access Connector (Legacy)

The original method. You provision a managed connector (a pool of e2-micro instances) in a /28 subnet, and Cloud Run routes egress through that connector. Connectors are billed hourly even when idle, and the pool can become a bottleneck under high throughput. Useful when you need to share a connector across multiple serverless services or when your VPC predates Direct VPC egress. New designs should prefer Direct VPC egress.

Egress Settings

For both options, the --vpc-egress flag controls scope:

all-traffic: every outbound packet routes through the VPC (needed when Cloud Run must appear as a VPC IP for IAM allowlists).
private-ranges-only: only RFC 1918 traffic routes through the VPC; public internet uses Google's managed NAT (the default).

For new Cloud Run services that need VPC access, choose Direct VPC egress over the Serverless VPC Access connector. It scales without manual instance sizing, eliminates the per-hour connector charge, and reduces network hops. See Direct VPC egress.

Connecting to Cloud SQL

Cloud Run integrates with Cloud SQL through three paths, each with different trade-offs.

Native Cloud SQL Connection

The simplest path: pass --add-cloudsql-instances=PROJECT:REGION:INSTANCE on deploy. Cloud Run runs an embedded Cloud SQL Auth Proxy that authenticates with the service account's roles/cloudsql.client role. Connections terminate at a Unix domain socket at /cloudsql/INSTANCE_CONNECTION_NAME. No public IP exposure on the database, no manual proxy management.

Private IP via Direct VPC Egress

For databases with private IPs (the production default), attach Cloud Run to the same VPC via Direct VPC egress and connect to the database's private IP directly. This is the lowest-latency option and is required for Cloud SQL instances configured with private_network only.

Self-Hosted Cloud SQL Auth Proxy as Sidecar

Run the Cloud SQL Auth Proxy as a sidecar container. Gives full control over proxy version and configuration; useful when you need IAM Authentication for database users or per-connection logging.

Connection Pooling Caveat

Cloud Run instances can spin up rapidly, so each instance's connection pool multiplied by max-instances must fit under the database's max_connections. For a Cloud SQL instance with max_connections=100, a Cloud Run service with --max-instances=50 and a pool of five per instance can saturate the database. PgBouncer as a sidecar or in a dedicated proxy tier is a common mitigation.

Secret Manager Integration

Cloud Run reads secrets from Secret Manager natively. Two mount modes exist.

Environment Variable Mode

--set-secrets=DATABASE_PASSWORD=db-password:latest injects the secret value as the env var DATABASE_PASSWORD at instance startup. The version is resolved once per revision; latest pins to whatever was current at deploy time, not dynamically. Rotating a secret requires deploying a new revision.

File Volume Mode

--set-secrets=/secrets/db=db-password:latest mounts the secret as a file under /secrets/db. The application reads the file path at runtime. Same revision-pinning semantics as env var mode by default, but using version latest with the pinned=false flag (where supported) makes the mount re-resolve.

IAM Requirements

The Cloud Run service account needs roles/secretmanager.secretAccessor on each secret. Failure to grant produces a startup error that is sometimes mistaken for an image pull failure — check the secret IAM bindings first when a fresh deploy fails to boot.

Cloud Run resolves Secret Manager secret versions at revision creation time when using :latest. Updating the secret does not propagate to running revisions; you must deploy a new revision (or use a versioned reference with file mode and pinned=false) for the rotation to take effect. See Secret Manager with Cloud Run.

Traffic Splitting and Revisions

Every deployment creates a new immutable revision. Traffic is a separate primitive that can be split arbitrarily across revisions.

Canary Releases

gcloud run services update-traffic --to-revisions=rev-005=10,rev-004=90 routes ten percent of traffic to the new revision and ninety to the previous. Watch error rates and latency in Cloud Monitoring, then promote with --to-latest once satisfied.

Tagged Revisions

Each revision can be assigned a tag (--tag=preview) that produces a URL like preview---my-service-abc.run.app. This URL routes 100% to the tagged revision regardless of traffic split, ideal for QA testing of a release candidate without affecting production traffic.

Blue/Green Cutover

For instantaneous swap, deploy without traffic (--no-traffic), warm up via the tagged URL, then flip 100% in one command. Rollback is the same command pointing at the previous revision.

Custom Domains and the Global Load Balancer

The default *.run.app URL is fine for backends, but production services typically front a custom domain.

Cloud Run Domain Mappings

gcloud run domain-mappings create --service=my-service --domain=api.example.com issues a managed TLS cert and routes to the service via Cloud Run's regional ingress. Limited to one region per mapping and lacks the advanced traffic management of a full load balancer.

Global External Application Load Balancer

The production-grade option. A serverless network endpoint group (NEG) of type SERVERLESS points the external Application Load Balancer at the Cloud Run service. Benefits: global anycast IP, Cloud CDN, Cloud Armor for WAF and DDoS, IAP for identity-aware proxying, custom routing rules across multiple backend services, and managed certificates via Certificate Manager. This is the standard architecture for production HTTPS services on Cloud Run.

Multi-Region Deployments

Deploy the same service to multiple regions, attach all as serverless NEGs in a global load balancer, and traffic is routed to the nearest healthy region automatically. Combine with Cloud Storage and Spanner for a fully global stack.

IAM, Authentication, and Service-to-Service Calls

Cloud Run's invocation model is IAM-driven from the ground up.

Public vs Authenticated Services

--allow-unauthenticated (or granting roles/run.invoker to allUsers) makes the service publicly reachable. Without it, every request must carry a valid Google-issued identity token. The exam often tests whether you should remove public access by default — the answer is usually yes for internal services.

The `roles/run.invoker` Role

The single permission that controls who can call a Cloud Run service. Grant it to a service account, a user, or allAuthenticatedUsers. Combined with VPC ingress restrictions (--ingress=internal or internal-and-cloud-load-balancing), it forms a defense-in-depth posture.

OIDC ID Tokens for Service-to-Service

When service A calls service B (both Cloud Run), service A must obtain an OIDC ID token signed by Google for service B's URL audience. The metadata server provides this via http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=https://service-b.run.app. Attach the token as Authorization: Bearer <token> on the outbound call. The Google client libraries (google-auth in Python, golang.org/x/oauth2/google) handle this automatically when you create an IDTokenCredentials source.

Ingress Restrictions

--ingress accepts all, internal, or internal-and-cloud-load-balancing. internal admits only traffic from the same VPC network or VPC Service Controls perimeter; internal-and-cloud-load-balancing also admits traffic from a global external load balancer. This is the production setting for services that should not be hit directly on *.run.app URLs.

For service-to-service authentication on Cloud Run, the caller obtains a Google-signed OIDC ID token from the metadata server using the callee's URL as the audience, and the callee verifies the token via the roles/run.invoker IAM policy. No long-lived API keys, no shared secrets. See Authenticating service-to-service.

Event-Driven Triggers with Eventarc

Eventarc is the integration plane that wires Cloud Run services to events from across Google Cloud.

Eventarc Trigger Sources

A trigger binds an event source (Cloud Storage finalize, Pub/Sub topic, Cloud Audit Logs from any service, Firestore document changes, BigQuery job completion, etc.) to a Cloud Run service URL. The event is delivered as a CloudEvents-formatted HTTP POST with the type and source in headers.

Common Patterns

A Cloud Run service that processes image uploads subscribes via an Eventarc trigger on google.cloud.storage.object.v1.finalized for a specific bucket. A compliance scanner subscribes to Cloud Audit Logs for setIamPolicy events across the org. A document parser subscribes to a Pub/Sub topic that fronts an Apigee webhook.

Trigger Setup

gcloud eventarc triggers create my-trigger --location=us-central1 --destination-run-service=processor --destination-run-region=us-central1 --event-filters="type=google.cloud.storage.object.v1.finalized" --event-filters="bucket=uploads" [email protected]. The trigger service account needs roles/run.invoker on the target and roles/eventarc.eventReceiver on the project.

Cloud Tasks for Asynchronous Fan-Out

For workloads where a synchronous request triggers more work than the 60-minute timeout allows, Cloud Tasks offloads the long tail.

How It Fits

The HTTP request handler enqueues N Cloud Tasks targeting a worker Cloud Run service, then returns 200 immediately. Cloud Tasks dispatches each task as an HTTP POST to the worker, respecting per-queue rate limits (max_dispatches_per_second), concurrency (max_concurrent_dispatches), and retry policy (max_attempts, exponential backoff).

Cloud Tasks vs Pub/Sub

Cloud Tasks gives explicit per-task control: you can schedule a specific task to run in fifteen minutes, deduplicate by task name, and cancel an individual task. Pub/Sub is higher throughput and at-least-once delivery without per-message control. For fan-out work where each task represents a unit of progress (a customer to email, a file to convert), Cloud Tasks is the cleaner fit. For event streams, prefer Pub/Sub.

Authenticated Task Dispatch

Cloud Tasks can attach an OIDC token to each HTTP dispatch, mirroring the service-to-service auth pattern. The queue's task config specifies oidcToken.serviceAccountEmail and oidcToken.audience. The worker Cloud Run service then validates the same way it would any other authenticated caller.

Common Pitfalls

Patterns that bite Cloud Run users repeatedly.

Forgetting `0.0.0.0` Bind

Binding to 127.0.0.1:$PORT makes the container reachable inside its own namespace but not from the Cloud Run front end. Always bind to 0.0.0.0.

Writing to a Non-Writable Path

Although the filesystem is writable, many base images set certain directories read-only. Use /tmp for scratch and verify your framework's default cache directory is writable. Container start failures with "permission denied" frequently trace to this.

Confusing `min-instances` With Reserved Capacity

min-instances=10 keeps ten instances warm but does not reserve burst capacity. If traffic spikes to a thousand concurrent requests against a service with concurrency=80, Cloud Run still has to cold-start additional instances. Use min-instances plus --cpu-boost plus a properly tuned --max-instances for predictable surge behavior.

Treating Background Threads as Free

In the default CPU-only-during-request mode, background goroutines, asyncio tasks, and timer callbacks are paused between requests. Application code that relies on a periodic flush will silently break. Either enable CPU always allocated or move the work to Cloud Scheduler triggering a Cloud Run job.

Hitting Cloud SQL Connection Limits

A 100-max-instances Cloud Run service with a pool of ten can require a thousand connections at peak. Cloud SQL db-custom-2-7680 defaults to 100 max connections. Either size up the database, lower --max-instances, or front the database with PgBouncer.

A Cloud Run service with --max-instances=200, --concurrency=80, and a per-instance connection pool of 10 can request up to 2,000 simultaneous Cloud SQL connections at peak — almost certainly higher than the database's max_connections. Pair --max-instances, concurrency, pool size, and Cloud SQL tier deliberately. See Cloud Run connecting to Cloud SQL.

Real-World Use Case

An e-commerce company runs a checkout API as a Cloud Run service with --min-instances=5, --max-instances=200, --concurrency=40, --cpu=2, --memory=2Gi, second-generation execution, and --cpu-boost. Direct VPC egress connects to a private Cloud SQL Postgres instance and a private Memorystore Redis. Secret Manager mounts the Stripe API key as a file. The service runs behind a global external Application Load Balancer with Cloud Armor and Cloud CDN; a custom domain checkout.example.com is mapped via Certificate Manager.

Order confirmation emails are handled by a sister Cloud Run service registered as a roles/run.invoker on the checkout service account. The checkout service obtains an OIDC ID token from the metadata server with the email service's URL as audience and posts the order payload. The email service in turn enqueues Cloud Tasks for delayed follow-ups (review request after seven days, replenishment reminder after thirty).

Nightly, a Cloud Run job runs with --tasks=50 and --task-timeout=30m to regenerate the product search index from BigQuery into Algolia. Cloud Scheduler fires the job at 02:00 UTC. Eventarc subscribes a fraud scanner Cloud Run service to Pub/Sub messages emitted by the checkout service whenever order amount exceeds five thousand dollars.

End state: the entire commerce stack runs on Cloud Run, scales from idle to Black Friday peaks autonomously, has no servers to patch, and costs the team about 18 percent of the equivalent always-on Kubernetes footprint they ran two years ago.

Exam Tips and Service Selection

The PCD exam tests the same Cloud Run decisions repeatedly. Memorize these mappings.

"Stateless HTTP service that must scale to zero" points to Cloud Run service. Cloud Functions 2nd gen also qualifies but is itself built on Cloud Run; pick Cloud Run when the question implies custom containers or multiple endpoints.
"Batch workload, finite list of tasks, no inbound HTTP" points to Cloud Run jobs. Distractor: Cloud Run service with a polling loop.
"Need background work to continue after the HTTP response" points to CPU always allocated. Default mode pauses CPU between requests.
"Service-to-service authentication without long-lived secrets" points to OIDC ID tokens from the metadata server, validated through roles/run.invoker. Distractor: API keys in Secret Manager.
"Cloud Run must reach a private Cloud SQL instance" points to Direct VPC egress plus --add-cloudsql-instances. Distractor: Serverless VPC Access connector — works, but legacy.
"Need a global anycast IP, Cloud CDN, and Cloud Armor in front of Cloud Run" points to the global external Application Load Balancer with a serverless NEG. Distractor: domain mapping (regional only, no CDN, no WAF).
"Canary release ten percent traffic to a new revision" points to gcloud run services update-traffic --to-revisions=NEW=10,OLD=90. Distractor: deploying with --no-traffic and Cloud DNS weighted routing.
"Event when an object lands in a Cloud Storage bucket triggers Cloud Run" points to an Eventarc trigger on google.cloud.storage.object.v1.finalized. Distractor: a Cloud Function — also works but the question usually pins Cloud Run.
"Cold start is killing the p99" points to --min-instances >= 1 plus --cpu-boost plus second-generation execution. Distractor: switching to App Engine standard.
"Long-running fan-out work that exceeds the request timeout" points to Cloud Tasks dispatching to a worker Cloud Run service. Distractor: increasing the request timeout to 60 minutes (still bounded).

Frequently Asked Questions

Q1: What is the maximum request timeout for a Cloud Run service?

A1: Sixty minutes per request, raised from the original five-minute limit. Note that the global external Application Load Balancer's own backend timeout defaults to 30 seconds, so when fronting Cloud Run through a load balancer you typically need to raise the backend timeout to match. WebSocket connections also obey this limit.

Q2: When should I choose Cloud Run jobs instead of Cloud Run services?

A2: Choose jobs when the workload has no inbound HTTP traffic and is a finite list of tasks: nightly reports, batch image processing, database migrations, or one-off data backfills. Jobs let you set --tasks for parallelism and --task-timeout up to 24 hours. Services are wrong here because they expect traffic and scale based on requests per second, not task count.

Q3: How does Cloud Run handle secrets, and how do I rotate them?

A3: Cloud Run integrates natively with Secret Manager via --set-secrets, mounting either as env vars or as files. Using :latest resolves the version at revision-create time, so rotation requires a new deploy. For automatic rotation, mount in file mode and pin to latest so the volume re-resolves, or trigger a re-deploy from a Cloud Build pipeline when Secret Manager fires a secretVersionCreated event through Eventarc.

Q4: What is the difference between Direct VPC egress and Serverless VPC Access connector?

A4: Direct VPC egress attaches the Cloud Run service directly to a subnet, scales automatically with traffic, has no hourly idle cost, and is the modern recommendation. The Serverless VPC Access connector is a pool of managed VMs that proxies traffic — it predates Direct VPC egress, has hourly cost even when idle, can become a throughput bottleneck, and remains for legacy compatibility. New designs should pick Direct VPC egress.

Q5: How do I authenticate service-to-service calls between two Cloud Run services?

A5: The caller obtains a Google-signed OIDC ID token from the metadata server with the callee's URL as audience, attaches it as a Bearer token, and the callee enforces invocation via roles/run.invoker IAM. Google client libraries handle token acquisition automatically. No shared secrets, no long-lived API keys, and tokens auto-rotate.

Q6: Can a Cloud Run service hold WebSocket connections?

A6: Yes. WebSocket and HTTP/2 streaming are supported, but each open connection counts toward the instance's concurrency limit and the connection's lifetime is bounded by the request timeout (max 60 minutes). For long-lived connection workloads at scale, GKE or Compute Engine with a load balancer often fits better — Cloud Run optimizes for short, bursty request patterns.

Q7: How do I run a Cloud Run service with no public access?

A7: Set --no-allow-unauthenticated on deploy (and remove allUsers from roles/run.invoker), then set --ingress=internal-and-cloud-load-balancing so the *.run.app URL is unreachable from the public internet. Access then requires either an authenticated request through roles/run.invoker or routing through an internal/external load balancer with IAP attached.

Introduction to Cloud Run

白話文解釋（Plain English Explanation）

The On-Demand Food Truck Analogy

The Hotel Concierge vs Hotel Cleaning Crew Analogy

The Express Lane Analogy

The Cloud Run Container Contract

Listening on $PORT

Statelessness and Ephemeral Filesystem

Request Lifecycle and Timeouts

The SIGTERM Shutdown Signal

Services vs Jobs

Cloud Run Services

Cloud Run Jobs

Picking Between Them

Concurrency and Scaling Levers

Per-Instance Concurrency

Min Instances

Max Instances

Request CPU Allocation vs Always Allocated

Execution Environment Generations

Generation 1

Generation 2

Startup CPU Boost

Sidecars and Multi-Container Services

When to Use Sidecars

Sidecar Configuration

Networking: VPC Egress Options

Direct VPC Egress

Serverless VPC Access Connector (Legacy)

Egress Settings

Connecting to Cloud SQL

Native Cloud SQL Connection

Private IP via Direct VPC Egress

Self-Hosted Cloud SQL Auth Proxy as Sidecar

Connection Pooling Caveat

Secret Manager Integration

Environment Variable Mode

File Volume Mode

IAM Requirements

Traffic Splitting and Revisions

Canary Releases

Tagged Revisions

Blue/Green Cutover

Custom Domains and the Global Load Balancer

Cloud Run Domain Mappings

Global External Application Load Balancer

Multi-Region Deployments

IAM, Authentication, and Service-to-Service Calls

Public vs Authenticated Services

The roles/run.invoker Role

OIDC ID Tokens for Service-to-Service

Ingress Restrictions

Event-Driven Triggers with Eventarc

Eventarc Trigger Sources

Common Patterns

Trigger Setup

Cloud Tasks for Asynchronous Fan-Out

How It Fits

Cloud Tasks vs Pub/Sub

Authenticated Task Dispatch

Common Pitfalls

Forgetting 0.0.0.0 Bind

Writing to a Non-Writable Path

Confusing min-instances With Reserved Capacity

Treating Background Threads as Free

Hitting Cloud SQL Connection Limits

Real-World Use Case

Exam Tips and Service Selection

Frequently Asked Questions

Q1: What is the maximum request timeout for a Cloud Run service?

Q2: When should I choose Cloud Run jobs instead of Cloud Run services?

Q3: How does Cloud Run handle secrets, and how do I rotate them?

Q4: What is the difference between Direct VPC egress and Serverless VPC Access connector?

Q5: How do I authenticate service-to-service calls between two Cloud Run services?

Q6: Can a Cloud Run service hold WebSocket connections?

Q7: How do I run a Cloud Run service with no public access?

Official sources

More PCD topics

Listening on `$PORT`

The `roles/run.invoker` Role

Forgetting `0.0.0.0` Bind

Confusing `min-instances` With Reserved Capacity