examlab .net The most efficient path to the most valuable certifications.
In this note ≈ 20 min

App Engine Standard and Flexible

3,820 words · ≈ 20 min read ·

PCD study notes on Google App Engine: Standard vs Flexible runtimes, instance classes F1/F2/F4/B1/B2/B4/B8, scaling configs, traffic splitting, custom domains with managed SSL, dispatch.yaml, cron.yaml, queue.yaml, and Cloud Run comparison.

Do 20 practice questions → Free · No signup · PCD

Introduction to App Engine

App Engine is Google Cloud's original Platform-as-a-Service (PaaS), launched in 2008, and is still one of the easiest ways to run an HTTP application without managing servers, operating systems, or load balancers. You give App Engine source code (Standard) or a container image (Flexible), and it handles routing, scaling, health checks, TLS termination, logging, and crash recovery. For PCD candidates, App Engine appears in scenario questions about request-driven web apps, mobile backends, A/B testing with traffic splitting, scheduled work, task queues, and "should I use App Engine or Cloud Run" decisions.

This study note covers the two environments (Standard and Flexible), the supported runtimes, the three scaling modes (automatic, basic, manual), instance classes from F1 to B8, traffic splitting strategies, version migration, custom domains with managed SSL, the three companion configuration files (dispatch.yaml, cron.yaml, queue.yaml), and the decision tree between App Engine and Cloud Run.

白話文解釋(Plain English Explanation)

Analogy 1: The serviced apartment vs the condo

App Engine Standard is a fully-serviced apartment. The room comes furnished, the cleaning crew arrives daily, and the building turns off the lights at night when nobody is home (scale to zero). You cannot knock down a wall or change the plumbing — you must use the kitchen appliances they provide (the supported language runtimes). App Engine Flexible is a condo: you can renovate the kitchen, bring in your own oven (a custom Docker image), and stock the fridge with whatever ingredients you like (any OS package or library). The trade-off is that the condo always has at least one occupant on standby — you cannot scale to zero, and the doorman (the underlying Compute Engine VM) costs you all night.

Analogy 2: The automatic gearbox vs cruise control vs manual

The three scaling modes are like three driving styles. Automatic scaling is an automatic gearbox: press the gas (send traffic) and the car decides when to shift up (add instances) and shift down (remove them). Basic scaling is cruise control with auto-off: when no requests arrive for a while, the car parks itself; when a new request shows up, it starts the engine. Manual scaling is a manual transmission stuck in third gear: you decide how many instances run, and they run forever — useful for background workers or in-memory caches you do not want evicted.

Analogy 3: The radio tuning knob for traffic

Traffic splitting on App Engine is a radio tuning knob with multiple stations playing at once. Version v1 plays on 90% of the dial, version v2 plays on 10%, and listeners hear whichever station their dial lands on. You can rotate the knob by single percent (--splits v1=0.9,v2=0.1), or you can tell the radio to remember each listener's station with a cookie (--split-by=cookie) so the same user always hears the same version. When you are ready to promote v2, you turn the knob fully to it and v1 keeps broadcasting silently until you switch it off.

App Engine Standard vs Flexible

The two environments are not interchangeable; choosing the wrong one is a classic exam trap.

Standard environment

App Engine Standard runs your code on a Google-managed sandbox using language-specific runtimes. The supported second-generation runtimes are Python 3.7+, Java 11/17/21, Node.js 10+, PHP 7.2+, Ruby 2.5+, and Go 1.11+. First-generation Python 2.7 and Java 8 are deprecated. Standard cold-starts in fractions of a second, can scale to zero instances, and bills per instance-hour at fine granularity. The catch: no shell access, no custom system libraries that the runtime does not ship, no long-running background threads outside of certain Java runtimes, and outbound network access through a sockets API that historically had quotas.

Flexible environment

App Engine Flexible runs your code inside Docker containers on Compute Engine VMs that Google manages for you. You pick the runtime (runtime: python, runtime: custom with your own Dockerfile, etc.). Flexible supports any language, custom OS packages, SSH access for debugging, and background processes, but the minimum instance count is 1 (no scale to zero), startup time is in minutes (full VM boot), and you pay for vCPU, memory, and persistent disk continuously while the VM exists.

The single most exam-relevant difference: only App Engine Standard scales to zero. If a question says "minimise cost when there is no traffic" or "burst from idle to thousands of requests in seconds", the answer is Standard, not Flexible. Flexible's minimum of one VM means you pay 24/7 even when idle.

Supported Runtimes and Custom Runtimes

Standard managed runtimes

App Engine Standard's runtimes follow a versioned, opinionated model. You declare runtime: python311 or runtime: java17 in app.yaml, and Google ships a managed base image with that interpreter plus your requirements.txt or Maven/Gradle dependencies. Standard runtimes are kept patched by Google; you pin minor version drift by re-deploying.

Flexible managed and custom runtimes

Flexible runtimes are friendlier to legacy apps. You can pick a managed runtime (runtime: nodejs, runtime: python) and let App Engine build a Dockerfile for you, or you can write runtime: custom and provide your own Dockerfile. The custom runtime is how teams run Rust, .NET Framework, COBOL via OpenCobolIDE, or any other stack — the container only needs to listen on the port specified by the PORT environment variable.

For PCD scenarios that mention "we need a Python 3.9 web app with a C++ extension and a system package only available in Debian", the Flexible custom runtime is the right answer. Standard's sandbox forbids arbitrary native libraries; Flexible lets you apt-get install whatever you need in the Dockerfile.

Scaling Modes: Automatic, Basic, Manual

Every App Engine service version declares exactly one scaling block in app.yaml. The three options are not interchangeable, and changing the block requires redeploying as a new version.

Automatic scaling

Automatic scaling targets request-driven apps. You set thresholds and App Engine inserts and removes instances to meet them. Key knobs:

  • target_cpu_utilization: 0.5 to 0.95, default 0.6. The scheduler tries to keep average CPU at this level.
  • target_throughput_utilization: 0.5 to 0.95, default 0.6. Compares concurrent-request capacity per instance.
  • max_concurrent_requests: how many simultaneous requests one instance handles (Standard default 10, max 1000 on second-gen runtimes).
  • min_instances and max_instances: caps on the autoscaler. min_instances: 0 enables scale to zero on Standard.
  • min_idle_instances and max_idle_instances: warm spare capacity for spikes.

Basic scaling

Basic scaling is for batch-ish or low-traffic workloads. App Engine spins up an instance when a request arrives and shuts it down after idle_timeout minutes of silence. You set max_instances and idle_timeout (default 5 minutes). Requests wait while a new instance starts, so latency-sensitive apps should not use Basic.

Manual scaling

Manual scaling pins a fixed number of instances. You declare instances: N, and App Engine keeps exactly N instances alive forever (until you redeploy). Manual is the only mode that allows requests up to 24 hours long on Standard and the only mode that guarantees a long-lived in-memory state. It is the right pick for WebSocket bridges, scheduled long-runners, or memory caches you do not want autoscaler to evict.

A common exam distractor: "We need an in-memory cache that survives across requests." Automatic scaling on Standard kills idle instances after a short window, blowing away the cache. The fix is either Manual scaling (instances pinned) or, more correctly, externalise state to Memorystore for Redis. Building cache on local instance memory is an anti-pattern even when Manual scaling makes it technically work.

Instance Classes F1, F2, F4, F4_1G and B1, B2, B4, B8

Instance classes determine the vCPU and memory allocated to each Standard instance. The F series is for automatic (front-end) scaling and the B series is for basic and manual (back-end) scaling. You declare instance_class: F2 in app.yaml.

Class Memory CPU limit Scaling Typical use
F1 256 MB 600 MHz Automatic Default; light web apps
F2 512 MB 1.2 GHz Automatic Medium APIs
F4 1024 MB 2.4 GHz Automatic Heavy request handlers
F4_1G 2048 MB 2.4 GHz Automatic Memory-hungry handlers
B1 256 MB 600 MHz Basic/Manual Light background work
B2 512 MB 1.2 GHz Basic/Manual Default for B series
B4 1024 MB 2.4 GHz Basic/Manual Heavier batch jobs
B4_1G 2048 MB 2.4 GHz Basic/Manual Big batch with state
B8 2048 MB 4.8 GHz Basic/Manual Largest backend tier

Billing for Standard is per instance-hour at the class's rate; F1 is the cheapest and B8 is the priciest. Picking too small a class causes OOM kills and 500s; picking too large wastes money on idle headroom. Right-size by watching the Cloud Monitoring appengine.googleapis.com/system/memory/usage and cpu/usage metrics.

F = Front-end = Automatic scaling. B = Back-end = Basic or Manual scaling. This pairing is enforced by App Engine. Trying to declare instance_class: B4 with automatic_scaling: is a deploy error. The mnemonic "F follows fast, B follows batch" sticks.

Traffic Splitting and Version Migration

Versions are immutable

Every gcloud app deploy creates a new immutable version of a service. Versions coexist; only routing decides which version sees user traffic. Two related operations control routing:

Traffic splitting

gcloud app services set-traffic SERVICE --splits=v1=0.9,v2=0.1 sends 90% of requests to v1 and 10% to v2. App Engine supports three splitting algorithms via --split-by:

  • IP (--split-by=ip): hashes the client IP. Same user often lands on the same version, but NATed users share.
  • Cookie (--split-by=cookie): App Engine sets a GOOGAPPUID cookie keyed to a version. The same browser always hits the same version, which is ideal for A/B testing where session continuity matters.
  • Random (--split-by=random): each request is rolled fresh. Best for stateless load tests.

Splits accept up to two decimal places for cookie/IP splitting (e.g. 0.05) and one decimal for random.

Migration vs splitting

The CLI separately supports gradual traffic migration: gcloud app services set-traffic SERVICE --splits=v2=1 --migrate. App Engine ramps traffic from current version to v2 over several minutes while watching health, automatically rolling back if errors spike. Migration only works for Standard automatic-scaling services with warm-up requests configured.

The --no-promote flag on gcloud app deploy is exam-critical. By default, gcloud app deploy makes the new version receive 100% of traffic immediately. Adding --no-promote deploys the version but leaves the current version serving — so you can run smoke tests against the new URL https://VERSION-dot-SERVICE-dot-PROJECT.appspot.com before flipping the switch. Pair with --no-stop-previous-version so you can roll back instantly.

Custom Domains and Managed SSL

Domain mapping with gcloud

By default, an App Engine app is reachable at https://PROJECT.appspot.com. Production deployments map custom domains via gcloud app domain-mappings create example.com. App Engine returns a list of DNS records (A, AAAA, CNAME) for you to add at your registrar.

Managed SSL and wildcard limitations

App Engine automatically provisions and renews managed SSL certificates through Let's Encrypt for any verified custom domain. Wildcard domains (*.example.com) require uploading your own certificate because Let's Encrypt wildcard issuance needs DNS-01 challenges that App Engine does not currently automate. SSL provisioning takes from minutes up to 24 hours after the DNS records resolve, and certificates auto-renew about 30 days before expiry.

Forcing HTTPS in app.yaml

You can force HTTPS at the handler level inside app.yaml:

handlers:
  - url: /.*
    script: auto
    secure: always

secure: always redirects HTTP to HTTPS at the load balancer, before your code runs.

For multi-region resilience or a single domain in front of many services, point the custom domain at a Global External HTTPS Load Balancer with a Serverless NEG instead of using App Engine domain mapping. The Load Balancer gives you Cloud CDN, Cloud Armor (WAF), and IAP, none of which are wired into the native App Engine domain mapping.

Services and dispatch.yaml

An App Engine application contains one or more services (formerly called "modules"). Each service is a microservice with its own code, runtime, scaling, and instance class. The first service deployed is always default. Services share quotas, datastore, and Cloud Tasks queues at the project level.

dispatch.yaml is a per-application routing table that overrides the default URL pattern (https://SERVICE-dot-PROJECT.appspot.com). It maps URL patterns to services so a single hostname can fan out:

dispatch:
  - url: "*/api/*"
    service: api-service
  - url: "admin.example.com/*"
    service: admin-console
  - url: "*/*"
    service: default

Deploy with gcloud app deploy dispatch.yaml. The first matching rule wins, so order them from most specific to least specific. dispatch.yaml is application-level, not service-level, so it is deployed once and shared across all services.

cron.yaml for Scheduled Tasks

cron.yaml registers HTTP cron jobs that the App Engine cron service invokes on a schedule:

cron:
  - description: "Nightly cleanup"
    url: /tasks/cleanup
    schedule: every 24 hours
    target: worker-service
    timezone: Asia/Taipei
  - description: "Daily 9am report"
    url: /tasks/daily-report
    schedule: every day 09:00

Deploy with gcloud app deploy cron.yaml. The cron service makes an HTTP GET to the URL on the named target service (or default if omitted). For security, you protect the cron URL by checking the X-Appengine-Cron: true header, which only App Engine's cron service can set; external callers cannot forge it. The exam often tests this header-based authentication pattern.

Behind the scenes, App Engine Cron is the same machinery as Cloud Scheduler. Google now recommends migrating new schedulers to Cloud Scheduler directly because it works across Cloud Run, Cloud Functions, Pub/Sub, and on-prem HTTP endpoints. cron.yaml still works, but Cloud Scheduler is the forward-compatible answer when the question asks "what scheduler should I use for a new multi-service architecture".

queue.yaml and Cloud Tasks

queue.yaml configured App Engine's classic Task Queue feature: push queues for HTTP fan-out and pull queues for worker pull-leasing. Modern accounts must use Cloud Tasks instead, which is the rebranded, multi-runtime successor. You still define queue config (rate limits, retry parameters, target service) but with gcloud tasks queues create or the Cloud Tasks API:

queue:
  - name: email-queue
    rate: 10/s
    bucket_size: 100
    max_concurrent_requests: 20
    retry_parameters:
      task_retry_limit: 10
      task_age_limit: 2d
      min_backoff_seconds: 5
      max_backoff_seconds: 300
      max_doublings: 16

Cloud Tasks queues can target App Engine HTTP handlers or arbitrary HTTPS endpoints (Cloud Run, Cloud Functions, external services). Tasks are protected with X-AppEngine-QueueName and X-AppEngine-TaskName headers when targeting App Engine; for non-AE targets, Cloud Tasks attaches an OIDC token.

App Engine vs Cloud Run Decision

Both products are serverless HTTP runtimes; this is the most common confusion. Use this decision matrix:

Choose App Engine Standard when:

  • The app fits one of the supported runtimes (Python, Java, Node, Go, PHP, Ruby) and you do not need a custom Dockerfile.
  • You need built-in features like cron.yaml, dispatch.yaml, free-tier daily instance hours, or the Memcache API (legacy).
  • You want a single product that handles routing, services, scheduling, and queues without wiring multiple components.

Choose Cloud Run when:

  • You want to deploy any container image (any language, any base OS).
  • You need full control over concurrency (1 to 1000 per instance) with the same per-second billing as App Engine Standard.
  • You need request timeouts up to 60 minutes (App Engine Standard caps at 10 minutes for HTTP; Flexible at 60 minutes).
  • You need direct VPC egress (Cloud Run has Direct VPC Egress; App Engine needs a Serverless VPC Connector).
  • You want gRPC, HTTP/2 streaming, or WebSocket support (Cloud Run supports both; App Engine Standard does not support gRPC and has limited HTTP/2).

Choose App Engine Flexible when:

  • You need a custom Docker image with full OS access AND want App Engine's services/dispatch model.
  • The team is heavily invested in app.yaml conventions and would rather not migrate.
  • For most greenfield container workloads, Cloud Run is now the default; Flexible is largely a legacy bridge.

PCD questions often phrase Cloud Run scenarios in App Engine vocabulary to test whether you spot the cue. Phrases like "any container image", "gRPC streaming", "Direct VPC Egress", or "request timeout up to 60 minutes" point to Cloud Run even when the question describes a "web app with traffic splitting" (a feature both products share). Read the constraints, not the buzzwords.

Configuration File: app.yaml

app.yaml is the heart of every service. A typical Python 3.11 Standard automatic-scaling configuration:

runtime: python311
service: api
instance_class: F2

automatic_scaling:
  target_cpu_utilization: 0.65
  min_instances: 1
  max_instances: 40
  max_concurrent_requests: 80

env_variables:
  DB_HOST: "10.0.0.5"
  LOG_LEVEL: "INFO"

handlers:
  - url: /static
    static_dir: static
    secure: always
  - url: /.*
    script: auto
    secure: always

inbound_services:
  - warmup

vpc_access_connector:
  name: projects/PROJECT/locations/REGION/connectors/CONN_NAME

Notable knobs: inbound_services: [warmup] enables pre-warmed instances by registering a /_ah/warmup request that loads code before the first user hits. vpc_access_connector lets the service reach private IPs in a VPC such as a Cloud SQL private instance or a Memorystore Redis instance.

Logging, Monitoring, and Health Checks

Every App Engine request emits a structured log entry to Cloud Logging under appengine.googleapis.com/request_log. Application stdout/stderr is captured at log levels INFO through ERROR. The Logs Viewer correlates request logs with application logs through a shared trace ID.

Cloud Monitoring auto-creates dashboards for appengine.googleapis.com/http/server/response_count, response_latencies, instance_count, and the per-instance memory/CPU usage. Health checks are split into liveness_check (does the instance exist?) and readiness_check (is the instance ready to accept traffic?). On Flexible, default endpoints are /liveness_check and /readiness_check; you can change them in app.yaml. On Standard, the platform manages health checks internally and you only see the result.

A GET /_ah/warmup request that App Engine sends to a freshly-launched instance before it joins the load-balancer pool. Implement a handler at that URL to pre-load caches, JIT-compile templates, or open database connections — reducing the latency of the next real request. Enable with inbound_services: [warmup] in app.yaml. Reference: https://cloud.google.com/appengine/docs/standard/configuring-warmup-requests

Common Pitfalls

  • Deploying without --no-promote during business hours, then discovering a broken version is now serving 100% of traffic. Always smoke-test first.
  • Picking Flexible to "save money on idle" — Flexible has no scale to zero. Standard or Cloud Run is the cheap idle answer.
  • Storing state on the instance filesystem — instances are ephemeral. Use Cloud Storage, Memorystore, or Firestore.
  • Forgetting that App Engine is regional — choose the region carefully at project initialisation (gcloud app create --region=asia-east1). You cannot change the region later without creating a new project.
  • Misusing manual scaling for short requests — manual instances cost continuously and do not auto-recover from crashes the same way automatic scaling does.

Exam Tips

  • Know the three scaling modes and which scenarios each fits.
  • Memorise the F-class / B-class pairing: F-class for automatic, B-class for basic and manual.
  • Know that Standard scales to zero, Flexible does not.
  • Know --no-promote and traffic splitting flags (--splits, --split-by, --migrate).
  • Know the three companion configs: dispatch.yaml (routing), cron.yaml (schedules), queue.yaml (task queues, now Cloud Tasks).
  • Know that App Engine apps are pinned to one region per project and the region cannot be changed.
  • Know when to suggest Cloud Run instead: any container, gRPC, 60-minute timeouts, Direct VPC Egress.
  • Know secure: always for HTTPS enforcement and the X-Appengine-Cron: true header for protecting cron URLs.

Frequently Asked Questions (FAQ)

How do I roll back an App Engine deploy?

Re-route traffic to the previous version with gcloud app services set-traffic SERVICE --splits=PREVIOUS_VERSION=1. Because every deploy creates an immutable version that stays in the project until you delete it (subject to a 210-version-per-service limit), rollback is just a routing flip. There is no separate "rollback" command — splitting traffic 100% back to the prior version is the rollback.

Can App Engine Standard reach a private Cloud SQL instance?

Yes, through a Serverless VPC Access connector. Declare vpc_access_connector: name: projects/.../connectors/... in app.yaml. The Standard service then issues outbound calls through the connector and reaches the private IP of the Cloud SQL instance, Memorystore, or any other resource in the VPC.

What is the maximum request timeout for App Engine Standard?

For HTTP requests, Standard caps at 10 minutes for automatic-scaling services. For Basic and Manual scaling, it is 24 hours but the request must be processed by a single instance. If you need 60-minute timeouts in a serverless model, Cloud Run is a better fit; if you need long-running async work, hand off to Cloud Tasks.

How does traffic splitting with cookies differ from random?

Cookie splitting (--split-by=cookie) sets a GOOGAPPUID cookie on the response. Subsequent requests with that cookie always route to the same version, giving each user a consistent experience — essential for A/B tests that compare metrics across whole sessions. Random splitting decides per request and is best for load testing or when versions are stateless and externally identical.

Why can I not change my App Engine region?

App Engine creates region-specific infrastructure (datastore namespaces, regional load balancers) at first project creation with gcloud app create --region=REGION. Migrating to a different region requires creating a new project, redeploying, and updating DNS. The lock-in is by design to keep latency and data residency predictable.

When should I prefer Cloud Run Jobs over App Engine for batch work?

Cloud Run Jobs is the right pick for one-off or scheduled non-HTTP batch tasks: ETL runs, image processing, nightly report generation. App Engine is request-driven; even cron jobs there fire HTTP requests to a handler that must complete within the request timeout. Cloud Run Jobs has its own task model with up to 24-hour execution and per-task parallelism.

Does App Engine charge for outbound traffic to Google APIs?

Outbound bytes from App Engine to Google Cloud services in the same region (Cloud Storage, BigQuery, Pub/Sub) are free. Cross-region or to-internet egress is billed at standard network rates. The free daily instance-hours quota for Standard (28 instance-hours/day on the F1 class) applies per project.

Further Reading

Official sources

More PCD topics