Introduction to Cloud SQL and Cloud Spanner
Cloud SQL and Cloud Spanner are Google Cloud's two flagship relational data platforms, and the PCD exam expects developers to know exactly when each is the right answer. Cloud SQL is a fully managed service for MySQL 5.7/8.0, PostgreSQL 12-16, and SQL Server 2017/2019/2022 — drop-in compatible with the open-source engines your app already speaks. Cloud Spanner is a horizontally scalable, externally consistent relational database with synchronous global replication and a 99.999% multi-region SLA.
This study note walks the full developer-facing surface: Cloud SQL HA topology and failover behaviour, the Cloud SQL Auth Proxy and IAM database authentication, in-region and cross-region read replicas, when AlloyDB's columnar engine beats stock PostgreSQL, Spanner regional vs multi-region instance configurations, interleaved tables and STORING indexes, querying Spanner from BigQuery via external datasets, Spanner change streams for CDC, backup plus PITR semantics, and finally Cloud SQL maintenance windows with denylist days for change-freeze periods.
白話文解釋(Plain English Explanation)
Three analogies make the Cloud SQL / Spanner decision tree click before you face it on the exam.
The Local Coffee Shop vs the Global Coffee Chain
Cloud SQL is a single beloved coffee shop on the corner. One barista (the primary instance) takes every order, and there's a backup barista in the stockroom (the HA standby in another zone) who can step in within ninety seconds if the first one faints. The shop is fast, familiar, and can serve a few thousand customers a day — but you can only make one drink at a time at the counter. Cloud Spanner is Starbucks worldwide: thousands of stores in dozens of countries, every store knows the same recipe to the millisecond, and you can order a venti latte in Taipei that automatically debits the same loyalty balance you used twenty minutes ago in Seattle. The price per cup is higher, but if you need globally consistent ordering and unlimited counter throughput, no single shop can replace the chain.
The Apartment Lease vs the Skyscraper Lease
Running on Cloud SQL is like leasing an apartment: the landlord (Google) handles plumbing, electricity, and the building maintenance schedule. You can call them at midnight if the boiler breaks (automatic failover), but if your family grows to fifty people, you cannot just "add a wing" — you have to move to a bigger unit (vertical scaling, with a brief downtime). Spanner is leasing floors in a skyscraper. When you need more space, the elevator simply gives you another floor (add a node) — same address, same kitchen recipes, same locks, just more square footage. The skyscraper never has to close for renovations because every floor was designed to be hot-swapped from day one.
The Backup Photocopier vs the Time Machine
Cloud SQL automated backups and point-in-time recovery feel like a backup photocopier — a snapshot of the documents you had at 2:00 a.m., plus a stack of binary logs you can replay to reconstruct any state between snapshots. Spanner's PITR is more like a time machine: every read can specify "show me the table as it looked at 14:32:08 UTC yesterday" and the database serves that consistent snapshot without restoring anything, because all writes are versioned by TrueTime timestamps. Same goal (recover from human mistakes), very different mechanics — and on the exam, the difference between "restore from backup" (Cloud SQL) and "stale read with READ_TIMESTAMP" (Spanner) is a frequent distractor.
Cloud SQL High Availability and Failover
Cloud Spanner's strongest consistency guarantee: every transaction is assigned a globally ordered TrueTime commit timestamp, and any subsequent read sees the effects of all transactions that committed before its read timestamp — regardless of which region or replica serves the read. Stronger than serializability because the order matches real wall-clock time, which is why Spanner can offer strongly consistent reads across regions without sacrificing the 99.999% SLA. See Spanner TrueTime and external consistency.
A Cloud SQL HA configuration is the production baseline. It is opt-in (you must select Multiple zones (highly available) at creation or run gcloud sql instances patch ... --availability-type=REGIONAL) and roughly doubles the instance cost.
Synchronous Standby Topology
Under the hood, Cloud SQL provisions a primary in one zone and a synchronous standby in a second zone of the same region. Writes commit on both nodes before the client receives an ACK, which is what backs the 99.95% per-instance SLA. The standby is invisible to clients; it does not accept connections of its own, and you cannot offload reads to it — that is the job of read replicas.
Automatic Failover Triggers
Cloud SQL initiates failover when health checks against the primary fail for roughly sixty seconds. The standby is promoted, the floating IP / DNS hostname is repointed, and existing TCP connections are killed. Failover typically completes in 60-120 seconds for MySQL and PostgreSQL; SQL Server is comparable. Applications must reconnect — there is no transparent connection migration.
Manual Failover for Drills
You can force failover with gcloud sql instances failover INSTANCE. Regulated workloads run this monthly as a chaos drill so that ops teams trust the runbook before a 3 a.m. incident. Failovers count against the mysql.failoverCount metric, useful for alerting if production is failing over unexpectedly.
Cloud SQL HA is regional only — the synchronous standby always lives in the same region as the primary. To survive a full regional outage you need a cross-region read replica that you can promote, plus an RTO that allows for asynchronous replication lag (typically seconds to minutes). See Cloud SQL high availability.
Cloud SQL Auth Proxy and IAM Database Authentication
Connecting securely from Cloud Run, GKE, App Engine, or a developer laptop is where most PCD exam questions on Cloud SQL live.
Cloud SQL Auth Proxy Mechanics
The Cloud SQL Auth Proxy is a small binary (or sidecar container at gcr.io/cloud-sql-connectors/cloud-sql-proxy) that authenticates to the Cloud SQL Admin API using Application Default Credentials, fetches a short-lived TLS certificate, and exposes a local TCP socket or Unix domain socket. Your app connects to 127.0.0.1:5432 (PostgreSQL) or /cloudsql/PROJECT:REGION:INSTANCE (Unix) with no IP allowlist, no static credentials beyond the database username/password, and no public IP exposure on the instance itself.
Required IAM Roles
The service account running the proxy needs roles/cloudsql.client (basic) or roles/cloudsql.instanceUser (for IAM database authentication). Granting at project level is acceptable for single-tenant projects; for shared projects use conditional IAM bindings targeting specific instance names.
IAM Database Authentication
Instead of managing a password in Secret Manager, you can enable cloudsql.iam_authentication=on on the instance and grant roles/cloudsql.instanceUser to a service account. The proxy then issues an OAuth access token as the database password, and the database verifies it against IAM. Benefits: no rotating passwords, full audit trail in Cloud Audit Logs, and revoking access is a single IAM binding change. Supported on MySQL 8 and PostgreSQL.
Cloud SQL Auth Proxy as a Cloud Run Sidecar
In Cloud Run, the modern pattern is to either pass --add-cloudsql-instances (Cloud Run runs the proxy for you) or attach the proxy as a sidecar in a multi-container revision. Sidecar mode gives you control over proxy version, structured logging, and IAM auth flags.
Read Replicas: In-Region and Cross-Region
Read replicas offload read traffic, support read-after-write at the expense of consistency, and double as disaster recovery primitives.
Asynchronous Replication Model
A Cloud SQL read replica receives the primary's binary log (MySQL), WAL (PostgreSQL), or transaction log (SQL Server) asynchronously. Replicas lag the primary by milliseconds under healthy load but can drift to minutes under write spikes or replica-side CPU saturation. The replication_lag metric in Cloud Monitoring is essential to watch.
In-Region Read Replicas
Create with gcloud sql instances create REPLICA --master-instance-name=PRIMARY --tier=... in any zone of the primary's region. Use cases: scaling read-heavy dashboards, isolating analytical queries from transactional traffic, supporting BI tools without blocking writes.
Cross-Region Read Replicas
By specifying a different --region, you get a cross-region replica that doubles as disaster recovery. On a regional outage you can promote it (gcloud sql instances promote-replica) — once promoted, it becomes a standalone primary and cannot rejoin as a replica. Typical RTO is the time to detect, decide, and promote (often 15-30 minutes for a human-driven flow).
Replica Limitations
Replicas do not have HA standbys of their own unless you explicitly enable them. Replicas cannot have their own replicas in MySQL 5.7. Schema changes must be made on the primary and flow through replication — DDL on a replica is rejected.
AlloyDB and the Columnar Engine
For PostgreSQL workloads that mix OLTP with heavy analytical scans, AlloyDB for PostgreSQL is the upgrade path beyond Cloud SQL for PostgreSQL.
Hybrid Transactional/Analytical Architecture
AlloyDB separates compute from storage, replicates writes via an intelligent storage layer, and adds machine-learning-driven indexing. Marketing claims include "4x faster than standard PostgreSQL for transactional workloads" and "up to 100x faster for analytical queries" — those numbers come from the next bullet.
The Columnar Engine
The AlloyDB columnar engine maintains an automatic, in-memory columnar representation of frequently scanned tables alongside the row store. The optimiser transparently routes analytical scans (large WHERE, GROUP BY, aggregate queries) to the columnar copy. Enable it with google_columnar_engine.enabled=on; let the auto-recommend feature pick which columns to materialise based on workload telemetry.
When AlloyDB Beats Cloud SQL
If your team is committed to PostgreSQL, runs queries that scan millions of rows for aggregations, and needs HTAP characteristics without dual-stack overhead (Cloud SQL plus BigQuery), AlloyDB is the right answer. For straightforward CRUD with point lookups, plain Cloud SQL is cheaper.
AlloyDB is the right escalation when a Cloud SQL for PostgreSQL workload starts mixing 10ms point lookups with multi-second analytic scans on the same tables. The columnar engine eliminates the dual-stack pattern of replicating into BigQuery for ad hoc analytics. See AlloyDB columnar engine.
Cloud Spanner Regional vs Multi-Region Configurations
Spanner instances are defined by an instance-config, which determines the replication topology and the SLA.
Regional Configurations (99.99% SLA)
Regional configs (e.g., regional-us-central1, regional-asia-east1) place three voting replicas across three zones in one region. Reads and writes are served locally; the SLA is 99.99% monthly availability. Cheaper than multi-region by roughly half and the default choice when all readers live in one region.
Multi-Region Configurations (99.999% SLA)
Multi-region configs (e.g., nam-eur-asia1, nam6, eur3) place voting replicas in three regions plus one or more read-only replicas. They use Paxos to commit writes across regions and publish a 99.999% monthly SLA — roughly five and a quarter minutes of allowed downtime per year. Pick nam6 for North America-only customers, eur3 for European data-residency requirements, and nam-eur-asia1 for genuinely global workloads.
Read-Only and Witness Replicas
Multi-region configs include read-only replicas that can serve stale reads with no Paxos round trip, and witness replicas that vote in Paxos but do not store table data — they exist to break ties when an entire region drops out.
Compute Capacity: Nodes and Processing Units
Spanner capacity is sold in processing units (PUs); 1,000 PU equals 1 node. Each node delivers approximately 10,000 QPS of reads / 2,000 QPS of writes / 2TB of storage as a rule of thumb. You can scale capacity online with no downtime and no schema changes.
Spanner's 99.999% multi-region SLA only applies to multi-region instance configurations (e.g., nam-eur-asia1). Regional configurations carry a 99.99% SLA. Picking the wrong config because "multi-region sounds safer" can quadruple your monthly bill for an SLA you do not need. See Spanner instance configurations.
Spanner Schema Design: Interleaved Tables and STORING Indexes
Spanner schema design is dominated by two physical-layout decisions: how to co-locate related rows, and how to make indexes self-sufficient for the read path.
Interleaved Tables
An interleaved table physically stores child rows inside the parent row's split. Declared with INTERLEAVE IN PARENT Parent ON DELETE CASCADE, an interleaved OrderItems table places every OrderItem row for a given OrderId on the same disk page as its parent Order row. The win is huge: joining Orders to OrderItems becomes a local scan, not a network round trip.
Hotspotting Avoidance
Never use a monotonically increasing column (timestamp, sequence number, auto-increment) as the leading primary key — every new row hits the same split and one node bottlenecks. Use UUIDs, bit-reversed sequences (BIT_REVERSE(seq, true)), or hash a tenant ID into the leading key column.
Secondary Indexes and the STORING Clause
A secondary index in Spanner is itself a table behind the scenes. By default, an index covers only the indexed columns plus the primary key. A query that needs additional columns must back-join to the base table. The STORING clause includes extra columns directly in the index entry, eliminating the back-join: CREATE INDEX OrdersByCustomer ON Orders(CustomerId) STORING (TotalAmount, Status). Storage cost goes up; read latency goes down.
NULL_FILTERED Indexes
Use CREATE NULL_FILTERED INDEX for sparse columns where most rows are NULL — the index only stores rows where the indexed column is non-NULL, saving significant storage. Useful for soft-delete flags, optional foreign keys, or feature gates.
A common Spanner anti-pattern is creating secondary indexes without the STORING clause and then complaining about read latency. Every read becomes an index probe plus a base-table seek — effectively doubling the IO. Always add STORING for the columns your hot queries return. See Spanner secondary indexes.
Querying Spanner from BigQuery via External Datasets
Spanner is built for operational workloads, but analytics teams want SQL access without standing up a CDC pipeline. BigQuery external datasets bridge the gap.
External Dataset Mechanics
In BigQuery, create an external dataset that points at a Spanner database: bq mk --connection ... followed by CREATE EXTERNAL TABLE referencing the connection. Queries against the external dataset are federated — BigQuery's SQL engine pushes filters and projections down to Spanner over a Cloud Spanner connection, executes the read, and returns results to the analyst as if querying a native BigQuery table.
Use Cases
Mix transactional Spanner data with BigQuery analytics tables in a single SQL query. Build dashboards over fresh operational data without scheduling a copy job. Avoid the lag, cost, and operational burden of a dedicated Datastream-to-BigQuery pipeline for low-volume analytics.
Limits and Costs
External dataset queries are subject to the Spanner instance's CPU budget — heavy analytical scans can starve transactional traffic. For high-volume analytics, prefer dedicated CDC to BigQuery via change streams (see below) and reserve the external dataset path for occasional ad hoc joins.
Cloud Spanner Change Streams
Change streams capture every insert, update, and delete on watched tables, exposed as a queryable feed.
Defining a Change Stream
DDL: CREATE CHANGE STREAM AllOrdersStream FOR Orders, OrderItems OPTIONS (retention_period = '7d');. Spanner then tracks mutations on those tables in an internal change record store with the specified retention (1 to 7 days).
Consumption via the Dataflow Connector
The supported consumption pattern is the Apache Beam SpannerIO connector running on Dataflow. The connector reads change records, hands them to your pipeline, and lets you sink the changes to BigQuery, Pub/Sub, Cloud Storage, or another Spanner instance. There is also a Google-provided Dataflow template for Spanner → BigQuery.
Common Use Cases
Build a near-real-time replica of Spanner data in BigQuery for analytics. Feed audit logs into a SIEM. Propagate cache invalidations to Memorystore. Trigger downstream workflows via Pub/Sub when specific rows change.
Differences from CDC in Cloud SQL
Cloud SQL exposes binary logs (MySQL) or logical replication slots (PostgreSQL) that can be consumed by Datastream to land changes in BigQuery or Cloud Storage. The mental model is the same; the implementation surface is different — Spanner change streams are first-class DDL objects, while Cloud SQL CDC is a Datastream-level concern.
Spanner Backup and Point-in-Time Recovery
Spanner's backup story is conceptually different from Cloud SQL's because the storage layer already keeps versioned data.
On-Demand and Scheduled Backups
gcloud spanner backups create creates a consistent backup of a database at a specific timestamp. Backups are stored in the same instance and have a retention of up to one year. Use a backup schedule (full or incremental) to automate hourly, daily, or weekly snapshots without writing custom code.
Point-in-Time Recovery
Each Spanner database has a version_retention_period (default 1 hour, configurable up to 7 days). Within that window, any read can specify a READ_TIMESTAMP or MIN_READ_TIMESTAMP and get a consistent view of the database at that moment — no restore required. To recover from accidental deletion, copy the affected rows out at an earlier timestamp into a recovery table.
Restoring a Backup
Restore creates a new database from the backup; you cannot restore into an existing database. The restore typically takes minutes to hours depending on size and runs concurrently with traffic on other databases in the instance.
Backup Encryption
Backups inherit the database's CMEK configuration. Cross-region copies (where supported) require the destination region to have the same key access, an important detail for data-residency audits.
Cloud SQL Maintenance Windows and Denylist Days
Cloud SQL maintenance is unavoidable — Google patches the underlying OS, database engine, and security fixes — but you control when it happens.
Configuring a Maintenance Window
Set a four-hour weekly window via gcloud sql instances patch INSTANCE --maintenance-window-day=SUN --maintenance-window-hour=3. Pick a low-traffic period in your tenants' time zone. Maintenance during the window causes one short restart of the primary (single-zone) or a failover-style restart (HA).
Maintenance Release Channels
Choose between production (default; receives stable updates after they bake in preview) and preview (early access, useful for dev/test). Production-critical workloads stay on production unless you need a specific patch.
Deny Maintenance Periods (Denylist Days)
For change-freeze events — Black Friday, end of quarter, an active incident response — declare a deny maintenance period of up to ninety days where Google will not perform any maintenance: gcloud sql instances patch INSTANCE --deny-maintenance-period-start-date=2026-11-25 --deny-maintenance-period-end-date=2026-12-02 --deny-maintenance-period-time=00:00:00. Stack multiple periods across the year (max 90 days total) to cover all your code-freeze windows.
Maintenance Notifications
Cloud SQL emits maintenance notifications a week before scheduled events to the project's notification email and to the Cloud Monitoring cloudsql.googleapis.com/database/maintenance log. Wire those into PagerDuty so an on-call engineer is aware before the restart fires.
Cloud SQL maintenance: 4-hour weekly window, configurable day/hour, with deny maintenance periods up to 90 days total for change freezes. Spanner has no equivalent — its rolling updates are invisible because the multi-region quorum tolerates per-replica restarts without downtime. See Cloud SQL maintenance overview.
Choosing Between Cloud SQL, AlloyDB, and Spanner
A decision framework for the exam and for real architecture reviews.
Pick Cloud SQL When
You need MySQL, PostgreSQL, or SQL Server compatibility, your workload fits within a single region, your data is under 64 TB, and your write throughput peaks below roughly 4,000 QPS. This covers the vast majority of SaaS backends, internal tools, and modernised legacy apps.
Pick AlloyDB When
You are committed to PostgreSQL, your workload mixes OLTP with analytical scans on the same tables, and you want to avoid maintaining a separate analytics stack. AlloyDB also wins for vector workloads via the pgvector extension and for very large PostgreSQL databases (up to 128 TB) where Cloud SQL's storage ceiling becomes a concern.
Pick Spanner When
You need horizontal write scale (tens of thousands of QPS sustained), global strong consistency, the 99.999% multi-region SLA, or you are building a system where the database itself must outlive any single region. Financial ledgers, global inventory, multi-tenant SaaS at hyperscale, and any application that today uses sharded MySQL with home-grown consistency are textbook Spanner workloads.
Exam Tips and Common Traps
The PCD exam loves to test the edges of these services.
Trap: Treating HA as DR
Cloud SQL HA protects against zonal failure within one region. It is not disaster recovery. A regional outage takes both the primary and the standby down. The DR primitive is a cross-region read replica that you can promote.
Trap: Choosing Spanner for a Blog
Spanner's minimum spend (one 1,000-PU instance running 24/7) is hundreds of dollars per month — overkill for any workload under a few thousand QPS. Default to Cloud SQL unless the requirements explicitly demand Spanner.
Trap: Forgetting STORING on Indexes
A Spanner secondary index without STORING doubles the read IO for any query that needs columns beyond the index key. Always model your hot read paths and add STORING clauses for the columns those queries return.
Tip: Reach for Connection Pooling Early
Cloud Run can scale to thousands of instances, each opening database connections. Pair Cloud SQL with PgBouncer (sidecar or hosted) or with cloud-sql-python-connector's built-in pooling. Spanner uses gRPC sessions with built-in pooling — the client library handles it.
Tip: Set version_retention_period Once
Bumping a Spanner database's version_retention_period from 1 hour to 7 days takes effect immediately and gives you a week of PITR for free. Do this on day one for production databases.
Frequently Asked Questions
Q1: Can I take Cloud SQL HA replicas across regions?
A1: No. HA standbys are always in a second zone of the same region. For cross-region resilience, create a cross-region read replica and promote it during a regional outage. Promotion is irreversible — the replica becomes a new primary.
Q2: Does Spanner support PostgreSQL syntax?
A2: Yes. Spanner offers a PostgreSQL interface dialect alongside its native GoogleSQL dialect. Pick the dialect at database creation; you cannot switch later. PostgreSQL dialect supports most pgwire clients, ORMs, and standard SQL features but is not 100% feature-compatible with stock PostgreSQL — verify extension and function support before migrating.
Q3: Can the Cloud SQL Auth Proxy be used without IAM authentication?
A3: Yes. The proxy authenticates itself to the Cloud SQL Admin API with IAM, but the actual database login still uses the database's username and password by default. Enabling cloudsql.iam_authentication=on plus roles/cloudsql.instanceUser lets the proxy use an OAuth token as the database password, eliminating password management entirely.
Q4: What is the maximum length of a Spanner change stream retention?
A4: Seven days. Set via OPTIONS (retention_period = '7d') on the CREATE CHANGE STREAM DDL. Records older than the retention period are garbage-collected and cannot be replayed.
Q5: How long can a Cloud SQL deny maintenance period last?
A5: Up to ninety days per single period, and you can stack multiple periods across the year. Use them for Black Friday, fiscal year-end, regulatory audits, or any window where an unplanned restart would be costly.
Q6: Are AlloyDB and Cloud SQL for PostgreSQL wire-compatible?
A6: Both speak the PostgreSQL wire protocol, so the same drivers (psycopg2, pgx, JDBC) work. AlloyDB tracks a specific PostgreSQL major version and ships Google-specific extensions like google_columnar_engine that Cloud SQL does not have. Migration in either direction usually works via pg_dump/pg_restore plus extension audits.