Introduction to Cloud Endpoints and API Gateway
Google Cloud ships two distinct API-management products that the PCD exam treats as separate but adjacent answers, plus a third (Apigee) that frequently shows up as a distractor. Cloud Endpoints is the older, proxy-based offering: you deploy the Extensible Service Proxy (ESP or ESPv2) as a sidecar in front of your backend (App Engine flexible, GKE, Compute Engine, Cloud Run) and the proxy enforces auth, quotas, and telemetry by talking to the Service Control API and the Service Management API. API Gateway is the newer, fully managed, serverless gateway purpose-built for Cloud Run, Cloud Functions (1st/2nd gen), and App Engine backends — Google operates the data plane on Envoy, you only upload an OpenAPI v2 spec and let Google provision the gateway endpoint. Apigee is the enterprise full-lifecycle platform (developer portal, monetization, advanced policies) and is the right answer only when the scenario mentions partner monetization, complex policy chains, or a public API product catalog.
For PCD you are expected to recognise which product fits a stated requirement, recall which transport protocols and auth schemes each supports, write a minimal OpenAPI spec that turns on API keys / JWT / Firebase Auth / Google ID tokens, configure quotas to throttle abusive clients, and read logs / metrics that the proxy emits to Cloud Logging and Cloud Monitoring. This note walks through each of those moving parts with concrete gcloud commands, OpenAPI snippets, and the trade-offs that drive the decision tree.
::tldrChoose API Gateway when the backend is Cloud Run / Cloud Functions / App Engine and you want zero-ops serverless. Choose Cloud Endpoints with ESPv2 when you need gRPC, gRPC transcoding to REST, or your backend runs on GKE / GCE / on-prem. Choose Apigee when the requirement is a developer portal, monetization, or advanced policy chains. All three integrate with API keys, JWT, and Cloud Logging / Monitoring; only Endpoints handles gRPC; only Apigee ships a full developer portal. ::
白話文解釋(Plain English Explanation)
API gateway vocabulary — proxy, sidecar, OpenAPI spec, Service Control, quota — is dense, so three analogies pin the concepts down before we get into the YAML and gcloud commands.
Analogy 1: The Hotel Front Desk vs. Concierge vs. Travel Agent
Picture three different people who can stand between a hotel guest and the hotel's amenities. The Front Desk Receptionist is API Gateway — they sit at one specific desk Google built and maintains, they check your room key (API key) and your reservation (JWT), and they hand you off to the right service: spa, gym, restaurant (Cloud Run, Cloud Functions, App Engine). They do not do gRPC because the hotel's spa never speaks gRPC. The Concierge is Cloud Endpoints with ESPv2 — they can stand at any door of any building (sidecar deploys next to your app on GKE, GCE, on-prem) and they speak both English and the staff's internal radio code (REST and gRPC). They are more flexible but you have to give them a uniform and tell them where to stand. The Travel Agent in the lobby is Apigee — they do not just check IDs, they sell tour packages (monetization), maintain a brochure rack (developer portal), and run loyalty programs (advanced policies). Overkill for "let internal services talk to each other," essential for "sell my public API to 200 partners."
Analogy 2: The Restaurant Menu (OpenAPI Spec)
The OpenAPI specification is the printed menu at a restaurant. The kitchen (your backend service) knows how to cook many dishes, but the menu tells customers which dishes are available, what each one is called (path), what variants exist (parameters), what allergens it contains (auth requirements: api_key, jwt), and how much each costs (quota cost). When you deploy a new dish, you reprint the menu — that is gcloud endpoints services deploy openapi.yaml or gcloud api-gateway api-configs create. The gateway / proxy itself is the waiter: it reads the menu, refuses orders not listed, fills in the table number (request ID) on the order ticket, and walks the order back to the kitchen. The waiter never improvises — if it is not in the spec, it is a 404.
Analogy 3: Tollbooth on the Highway (Quotas and Service Control)
Quotas behave like a tollbooth with a daily punch card. The OpenAPI spec declares the quota metric ("read-requests", "write-requests") and the limit ("1000 per minute per consumer"). Each call carries an API key — that is the punch card. The proxy (ESP) or gateway calls back to the Service Control API every request to ask "does this card have room left?" and to report what was consumed. If the card is full, the proxy returns HTTP 429 instantly without bothering the backend. The backend is your scenic toll road — it gets to focus on driving experience, not counting cars. The Service Management API is the separate office in town hall where you go to register the toll road, declare the rates, and print new punch cards; the gateway data plane never touches that office at runtime.
Use the front desk vs. concierge vs. travel agent analogy when the question forces a choice between API Gateway, Endpoints, and Apigee. Use the menu analogy when the question is about OpenAPI fields, paths, security definitions, or x-google- extensions. Use the tollbooth analogy when the question involves quotas, rate limiting, 429 responses, or the Service Control / Service Management split. Reference: https://cloud.google.com/endpoints/docs/openapi/architecture-overview
Cloud Endpoints Architecture — ESP, ESPv2, and the Service Control Loop
Cloud Endpoints is fundamentally a managed control plane plus an open-source data plane proxy you run yourself. Understanding which half runs where is the single most useful piece of exam mental model.
ESP vs. ESPv2 — which proxy do you actually run?
ESP (Extensible Service Proxy) is the original NGINX-based proxy. It supports OpenAPI 2.0 and gRPC, runs on App Engine flexible, GKE, GCE, and Kubernetes. ESPv2 is the modern replacement built on Envoy; it supports the same OpenAPI 2.0 input plus newer features like gRPC transcoding to JSON/REST, better HTTP/2 handling, and is the only proxy supported on Cloud Run (because Cloud Run already runs containers and Envoy fits the model). New deployments should default to ESPv2 (gcr.io/endpoints-release/endpoints-runtime-serverless:2). ESP is in maintenance mode but still appears in older PCD reference architectures.
The Service Control call path
Every inbound request flows through this sequence inside the proxy:
- Authentication — validate API key (against Service Management) and/or JWT (against the configured issuer's JWKS).
- Authorization / quota check — call
servicecontrol.googleapis.com/v1/services/{service}:checkto confirm the API key is enabled and quota has room. - Forward to backend — over HTTP/1.1, HTTP/2, or gRPC.
- Report — call
servicecontrol.googleapis.com/v1/services/{service}:reportasynchronously with metrics (latency, status, bytes, quota consumed). The check is synchronous and adds ~5–20 ms; the report is batched and effectively free. This is also why running ESPv2 with the wrong service account (missingroles/servicemanagement.serviceController) breaks every request with a 503 — the proxy cannot reach Service Control.
Service Management API vs. Service Control API
Service Management is the config plane — it stores your OpenAPI spec, API key definitions, quota definitions, and version history. You hit it once per deploy with gcloud endpoints services deploy. Service Control is the runtime plane — the proxy hits it on every request to check and report. The two APIs are commonly conflated in distractor answers; recognising the split is worth at least one exam point.
The ESPv2 container needs a service account with at minimum roles/servicemanagement.serviceController (often called the Endpoints Service Agent role). Without it the proxy returns 503 on every request and you will see PERMISSION_DENIED errors against servicecontrol.googleapis.com in Cloud Logging. On Cloud Run, attach the service account with --service-account on the proxy revision, not the backend revision.
OpenAPI Specifications — How You Describe an Endpoints / API Gateway Service
Both Cloud Endpoints and API Gateway consume OpenAPI 2.0 (the spec formerly known as Swagger 2.0). OpenAPI 3.0 is not supported by either product at the time of this writing — be careful with distractors that suggest "upload your OpenAPI 3.0 spec." For gRPC services, Cloud Endpoints accepts a .proto file plus a gRPC service config YAML instead.
Minimum viable spec
A working openapi.yaml for API Gateway pointed at a Cloud Run backend looks like:
swagger: "2.0"
info:
title: orders-api
version: 1.0.0
host: orders.gateway.dev
schemes: [https]
produces: [application/json]
x-google-backend:
address: https://orders-abcdef-uc.a.run.app
paths:
/orders/{id}:
get:
operationId: getOrder
parameters:
- name: id
in: path
required: true
type: string
security:
- api_key: []
responses:
"200": { description: OK }
securityDefinitions:
api_key:
type: apiKey
name: key
in: query
Note x-google-backend.address — this Google-specific extension is what tells API Gateway where to forward calls. For Endpoints, the equivalent extension on a Cloud Run deployment is x-google-backend.jwt_audience to forward a Google-signed ID token to a private Cloud Run service.
x-google-* extensions you should recognise
x-google-backend.address— backend URL.x-google-backend.deadline— per-route timeout in seconds (default 15, max 300 for API Gateway).x-google-backend.path_translation—CONSTANT_ADDRESSvs.APPEND_PATH_TO_ADDRESS.x-google-issuer,x-google-jwks_uri,x-google-audiences— JWT validation parameters undersecurityDefinitions.x-google-management.metricsandx-google-management.quota— declare custom quota metrics for rate limiting.x-google-allow—allorconfiguredto control whether unlisted paths are forwarded or rejected.
Deploying the spec
For Endpoints: gcloud endpoints services deploy openapi.yaml produces a config ID; you then deploy your backend with the proxy image referencing --rollout_strategy=managed so it always pulls the newest config. For API Gateway: gcloud api-gateway api-configs create v1 --api=orders --openapi-spec=openapi.yaml, then gcloud api-gateway gateways create orders-gw --api=orders --api-config=v1 --location=us-central1. The Gateway URL is regional and looks like https://orders-gw-<hash>.uc.gateway.dev.
gRPC Support — Why Endpoints Wins Here
gRPC support is the cleanest dividing line between the two products and a near-guaranteed exam question.
gRPC pass-through with Endpoints
Cloud Endpoints with ESPv2 is the only Google API-management product that proxies native gRPC end-to-end. You define your service with a Protocol Buffers .proto file, generate a gRPC service config YAML, and gcloud endpoints services deploy api_descriptor.pb api_config.yaml. ESPv2 then accepts inbound application/grpc requests over HTTP/2, enforces auth and quotas exactly as it does for REST, and forwards binary protobuf frames to the backend.
gRPC transcoding — REST in, gRPC out
ESPv2 also supports gRPC transcoding: you annotate your .proto with google.api.http options and the proxy will accept JSON over HTTP/1.1 from clients, translate to gRPC for the backend, and translate the response back to JSON. This is how Google's own public APIs (Pub/Sub, Spanner, BigQuery) expose both REST and gRPC from a single service definition. Transcoding is configured purely in the .proto — no separate OpenAPI spec needed.
Why API Gateway cannot do gRPC
API Gateway does not support gRPC backends and does not perform gRPC transcoding. Its Envoy data plane is configured for HTTP/1.1 and HTTP/2 JSON only. If a PCD question describes "I have a gRPC service on GKE and I want managed auth and quotas," the answer is Cloud Endpoints with ESPv2 sidecar, not API Gateway.
A common distractor: "API Gateway supports HTTP/2, therefore it supports gRPC." HTTP/2 is necessary but not sufficient — gRPC also requires bidirectional streaming framing, trailers, and the application/grpc content type, none of which API Gateway is configured to proxy. For any scenario mentioning .proto files, protobuf, or application/grpc, choose Cloud Endpoints with ESPv2.
Authentication Methods — API Keys, JWT, Firebase, Google ID Tokens, OAuth 2.0
Both products implement the same five authentication patterns, declared as securityDefinitions in the OpenAPI spec and enforced by the proxy / gateway before the request reaches your backend.
API keys
API keys identify the calling project / consumer, not the calling user. They are created under APIs & Services → Credentials and passed as ?key=... query param or x-api-key header. Use them for rate limiting, usage analytics per consumer, and disabling abusive clients quickly. Do not use them for user authentication — keys leak.
JWT (issuer + JWKS)
Generic JWT validation: declare x-google-issuer (e.g., https://accounts.example.com), x-google-jwks_uri (the public-key endpoint), and x-google-audiences (the expected aud claim). The proxy verifies the signature, expiry, and audience on every request. This is the standard pattern for machine-to-machine auth where you control the identity provider.
Firebase Auth
A specialisation of JWT validation where the issuer is https://securetoken.google.com/<project-id> and the JWKS URI is https://www.googleapis.com/service_accounts/v1/metadata/x509/[email protected]. Use this when the client is a mobile app or web app already authenticating users with Firebase Authentication (Google sign-in, email/password, phone OTP, anonymous).
Google ID tokens (service-account auth)
For backend-to-backend calls where the caller is another GCP workload (Cloud Run, Cloud Functions, GKE pod with Workload Identity), the caller obtains a Google-signed ID token from the metadata server with a specific aud claim, and the proxy validates it against https://www.googleapis.com/oauth2/v3/certs. This is the pattern Cloud Run uses for service-to-service auth on the same VPC.
OAuth 2.0
For user-delegated access (third-party apps acting on a user's behalf), declare an x-google-issuer of https://accounts.google.com and audiences matching your OAuth client ID. The proxy validates Google-issued OAuth access tokens. For non-Google OAuth providers (Auth0, Okta), use the generic JWT pattern with the provider's JWKS URI.
The aud claim names the intended recipient of a JWT. Cloud Endpoints / API Gateway will reject a token whose aud does not match x-google-audiences. For Cloud Run-to-Cloud Run calls, the convention is to set aud to the fully-qualified URL of the receiving service (e.g., https://orders-abc-uc.a.run.app); for API Gateway-fronted services, set aud to the API name declared in the OpenAPI spec.
Rate Limiting and Quotas — Service Control in Action
Quotas in Endpoints / API Gateway are declared in the OpenAPI spec, enforced by Service Control, and tied to API keys for per-consumer attribution.
Declaring a quota
You declare a metric and one or more limits:
x-google-management:
metrics:
- name: "read-requests"
displayName: "Read requests"
valueType: INT64
metricKind: DELTA
quota:
limits:
- name: "read-limit"
metric: "read-requests"
unit: "1/min/{project}"
values:
STANDARD: 1000
Then attach the metric cost per operation:
paths:
/orders/{id}:
get:
x-google-quota:
metricCosts:
read-requests: 1
A consumer (project) calling GET /orders/{id} more than 1000 times per minute will start receiving HTTP 429 with a RateLimitExceeded error, without the backend being touched.
Per-method vs. per-consumer limits
Quotas are always per consumer project (identified by API key) — you cannot rate-limit by source IP at the Endpoints / API Gateway layer (that is Cloud Armor's job). To rate-limit anonymous traffic, require API keys on every path and treat the key itself as the rate-limit unit.
Quota overrides
Consumers can request quota overrides for specific projects — useful for giving a premium customer 10× the default. Overrides are managed under APIs & Services → Quotas in the consumer's project, not yours.
Decision Tree — Cloud Endpoints vs. API Gateway vs. Apigee
This is the single most repeated decision on the PCD exam. Memorise the order of questions, not just the products.
Step 1: Is the backend serverless (Cloud Run / Cloud Functions / App Engine) and REST-only?
Yes → API Gateway. Zero ops, regional endpoint, fastest setup, free tier (2M calls/month). Done.
Step 2: Does the service speak gRPC, or run on GKE / GCE / on-prem?
Yes → Cloud Endpoints with ESPv2 sidecar. This is the only path that handles gRPC, gRPC transcoding, or non-serverless backends.
Step 3: Do you need a developer portal, monetization, partner onboarding, mediation between SOAP/REST, or complex policy chains?
Yes → Apigee X or Apigee hybrid. Apigee is the only product with a built-in developer portal, API products, monetization, and a policy chain editor. It is also dramatically more expensive — typical floor is ~USD $500/month for Apigee X eval.
Step 4: Is it internal-only east-west traffic between microservices on GKE?
Consider Anthos Service Mesh (Istio-based) instead of any of the above. Service Mesh handles mTLS, retries, traffic splitting, and observability natively without an external gateway.
API Gateway = serverless backends, REST only, zero ops. Cloud Endpoints + ESPv2 = gRPC, transcoding, or non-serverless backends. Apigee = developer portal, monetization, or complex policy mediation. If the question mentions gRPC, the answer is Endpoints. If the question mentions a developer portal, the answer is Apigee. If neither, default to API Gateway.
Monitoring, Logging, and Tracing
Both products emit the same telemetry through Service Control, but the resource types and dashboards differ.
Cloud Logging
Endpoints emits to the log name endpoints_log under the resource type api; API Gateway emits to apigateway.googleapis.com/requests under the resource type apigateway.googleapis.com/Gateway. Both capture request method, path, status, latency, consumer API key (hashed), and JWT subject. Filter examples:
resource.type="api" AND severity>=ERROR
resource.type="apigateway.googleapis.com/Gateway" AND httpRequest.status>=500
Cloud Monitoring
Both publish standard metrics under the serviceruntime.googleapis.com/ namespace: api/request_count, api/request_latencies, api/request_sizes, api/error_count. Set alerts on 5xx error rate above 1% and p99 latency above 500 ms as a baseline SRE pattern. Per-consumer slicing is available via the consumer_id label.
Cloud Trace
ESPv2 and API Gateway both propagate X-Cloud-Trace-Context headers and sample a configurable fraction of requests to Cloud Trace. Spans are emitted under the service name declared in the OpenAPI info.title. End-to-end traces let you see proxy latency vs. backend latency separately — invaluable for proving "the gateway is not the bottleneck."
Endpoints Portal
Endpoints ships a basic generated developer portal at https://endpointsportal.<project-id>.cloud.goog showing the spec, methods, and try-it-out forms. It is not equivalent to Apigee's developer portal — no monetization, no app registration workflow.
The default Service Control check/report quota is 6000 calls per minute per service — and the proxy issues one check per inbound request. A service sustaining more than 100 requests/second per region will start seeing RESOURCE_EXHAUSTED 503s from Service Control itself, not from your backend. The fix is a quota-increase ticket against servicecontrol.googleapis.com referencing the specific service name from gcloud endpoints services list, often raised to 60000/min for production APIs.
Custom Domains and Regional API Gateway
Out of the box, both products give you a Google-owned hostname (<service>.endpoints.<project>.cloud.goog or <gateway>-<hash>.uc.gateway.dev). Production deployments almost always need a custom domain.
Custom domain on API Gateway
API Gateway does not directly accept custom domains. The supported pattern is to put a Global External HTTP(S) Load Balancer in front of the gateway with a serverless NEG pointing at the gateway, attach your managed SSL certificate, and add the custom domain at the load balancer. Alternatively for simpler setups, front the gateway with Cloud Run acting as a thin reverse proxy on a custom domain mapping.
Custom domain on Cloud Endpoints
Endpoints supports two paths. (1) DNS-verified custom domain — verify ownership in Search Console, then deploy your spec with host: api.example.com. Suitable for App Engine and GKE backends where you control the load balancer. (2) Global External HTTP(S) Load Balancer in front of ESPv2 on Cloud Run / GKE, same pattern as API Gateway.
Regional vs. global
API Gateway is regional — gcloud api-gateway gateways create --location=us-central1. There is no built-in global anycast. For multi-region deployment, create one gateway per region and use a Global External Load Balancer with serverless NEGs in each region for low-latency failover. Cloud Endpoints with ESPv2 on Cloud Run inherits Cloud Run's regional model with the same multi-region load-balancer pattern. Endpoints on App Engine standard is multi-region by virtue of App Engine's global front-end.
TLS and certificates
For custom domains via Load Balancer, use Google-managed SSL certificates (gcloud compute ssl-certificates create --domains=api.example.com --global). DNS validation typically completes in 15–60 minutes. Self-managed certs (--certificate=/path/to/cert.pem --private-key=...) are supported but you take on rotation duty.
Security Policies and Best Practices
Beyond authentication, several gateway-level policies materially reduce attack surface.
Always require API keys on public paths
Even when JWT auth is the primary mechanism, require an API key on top. The key lets you disable abusive consumers in seconds without rotating JWT-signing keys.
Layer Cloud Armor in front for L7 protection
Endpoints and API Gateway do not include WAF rules, geo-blocking, or DDoS shaping. Front the gateway with a Global External HTTP(S) Load Balancer + Cloud Armor policy when the API is internet-exposed. Cloud Armor handles OWASP Top 10 rule sets, bot management, and rate-limiting by IP — the layer the gateway cannot do.
Use HTTPS-only and HSTS at the load balancer
Both gateways accept HTTPS by default, but the load balancer must be configured for HTTP-to-HTTPS redirect and Strict-Transport-Security header injection (via a backend-bucket or Cloud Run proxy adding the header). API Gateway itself does not inject HSTS.
Pin JWT audiences
Never set x-google-audiences: "*" — that effectively disables audience validation and allows any valid Google ID token to call your service. Always pin to the specific API name or backend URL.
Audit logs
Enable Data Access audit logs for servicemanagement.googleapis.com and servicecontrol.googleapis.com if your compliance regime requires evidence of API spec changes and runtime auth decisions. They are off by default for Data Access.
Pricing and Quotas (Operational Reality)
- API Gateway: USD $3 per million calls after the 2M/month free tier; egress charged separately.
- Cloud Endpoints: First 2M calls/month free, then USD $3 per million for calls 2M–1B, USD $1.50 per million above 1B. ESP/ESPv2 compute runs on your backend (Cloud Run, GKE) and is billed there.
- Apigee X: Starts around USD $500/month for the eval tier; production "Standard" starts in the low thousands per month.
- Service Control quota: 6000 check + report calls per minute per service by default; high-traffic APIs may need a quota increase ticket.
Frequently Asked Questions (FAQs)
Q1: Can API Gateway proxy to a backend outside Google Cloud (on-prem or AWS)?
A1: Not directly. x-google-backend.address must be a publicly reachable HTTPS URL, but the gateway has no native way to reach a private on-prem service. The supported pattern is to put a Cloud Run service or HTTP(S) Load Balancer with hybrid connectivity NEGs in between, then point API Gateway at that. Cloud Endpoints with ESPv2 deployed on the on-prem cluster itself is the more direct answer.
Q2: How do I securely call a private Cloud Run service from API Gateway?
A2: Configure x-google-backend.jwt_audience to the Cloud Run service URL and attach a service account to the gateway with roles/run.invoker on the target service. API Gateway will mint a Google-signed ID token with that audience and pass it as Authorization: Bearer ... to the backend. The Cloud Run service must have --no-allow-unauthenticated set.
Q3: How do I version a public API without breaking existing clients?
A3: Use path-based versioning in the OpenAPI spec (/v1/orders, /v2/orders) and deploy both versions as separate API configs (or in the same spec). For Endpoints, deploy a new config ID and roll out gradually with --rollout_strategy=managed. For API Gateway, create a new api-config and update the gateway to point at it — there is no built-in traffic-splitting between configs (use a load balancer in front for canary).
Q4: Does Cloud Endpoints support WebSockets?
A4: ESP (NGINX) supports WebSocket pass-through; ESPv2 (Envoy) supports WebSockets only since version 2.41+, and only in HTTP/1.1 mode. API Gateway does not support WebSockets. For long-lived bidirectional streams, prefer gRPC streaming over Endpoints or a direct Cloud Run service without a gateway.
Q5: How do I protect against API abuse beyond what quotas can do?
A5: Combine three layers — (1) Cloud Armor in front for IP-based rate limiting, geo-blocking, and OWASP rules; (2) API Gateway / Endpoints quotas for per-consumer throttling; (3) Cloud Logging + alerting on anomalous patterns (sudden 4xx spike, new consumer IDs). For credential-stuffing protection on auth endpoints, also enable reCAPTCHA Enterprise at the client.
Q6: Can I use these services to manage APIs outside of Google Cloud?
A6: Cloud Endpoints (via ESP/ESPv2) is a container image, so it can run anywhere — GKE on-prem, EKS, bare-metal Kubernetes, even a plain Docker host — and still report to Service Control over the public internet. API Gateway is a managed Google Cloud service with no portable form; its backends can technically be any HTTPS endpoint but the gateway itself only runs on Google Cloud. Apigee hybrid is the explicit hybrid offering for on-prem / multi-cloud control planes.
Q7: How do I secure my API from being overloaded?
A7: Two-layer defence — declare per-consumer quotas in the OpenAPI spec (e.g., 1000 requests/minute per project) so abusive clients hit HTTP 429 at the proxy without reaching the backend, then layer Cloud Armor rate-limiting rules at the load balancer for IP-level throttling on anonymous or unauthenticated paths. For burst protection on the backend itself, set Cloud Run --max-instances and --concurrency so even if quotas fail open, the backend cannot exceed a known capacity.