Managing Data Lifecycle Policies

Q: What is the difference between a Cloud Storage retention policy and a lifecycle delete rule?

A retention policy sets a minimum age before objects can be deleted; it is a floor. A lifecycle delete rule actually removes objects after a specified age; it is the action. To express "keep for exactly 7 years then erase," you need both: a 7-year retention policy (locked, if compliance requires immutability) and a lifecycle rule with action Delete at age 2555 days. Using only the retention policy means objects live forever.

Q: Does reading data from a BigQuery long-term storage partition reset its long-term timer?

No. Only writes (INSERT, UPDATE, DELETE, MERGE, streaming inserts, load jobs targeting the partition, copy jobs) reset the 90-day modification clock. Pure reads, including SELECT queries and BI tool dashboards, never affect long-term pricing. This is why long-term storage is risk-free to enable — analytics workloads pay the lower rate without any architectural changes.

Q: Can I delete a Cloud Storage bucket with a locked retention policy?

Only if every object in the bucket has aged past the locked retention duration. If even one object is still under retention, the bucket cannot be deleted. There is no admin override, no support escalation that removes the lock, and no API to shorten or remove a locked policy. The lock is permanent by design — that is the property auditors require.

Q: How does Bigtable garbage collection differ from explicit DELETE mutations?

Garbage collection is eventual and runs opportunistically during compactions. A cell whose max-age expired may still be readable for minutes or hours afterward. An explicit DELETE mutation issued via the Bigtable API takes effect immediately for new reads (subject to read-your-writes semantics within a row). For compliance-grade deletion (GDPR, CCPA), use explicit deletes; for cost-driven cleanup, garbage collection is fine.

Q: What happens to Pub/Sub messages that exceed the subscription retention window?

They are dropped permanently. A subscription with the default 7-day retention drops any unacknowledged message on day 8, and there is no way to recover it. To prevent message loss during prolonged consumer outages, enable topic-level message retention (up to 31 days) which preserves messages independently of subscription state and allows replay via seek to a timestamp within the topic retention window.

Q: How do I implement GDPR right-to-be-forgotten when backups are immutable?

Use crypto-shredding. Encrypt each user's personal data with a dedicated per-user encryption key managed in Cloud KMS. When an erasure request arrives, destroy the key version in KMS. The ciphertext on disk and on every backup remains, but it is mathematically unrecoverable without the key. This is the only practical pattern for erasing data from append-only backup systems and is recognized by EU data protection authorities as adequate for Article 17 compliance.

Q: Are Cloud Audit Logs deleted automatically when the retention period ends?

Yes, by default. Admin Activity and System Event logs are retained for 400 days; Data Access and Policy Denied logs for 30 days. After that, log entries are deleted from the default _Default Cloud Logging bucket. To retain longer, configure a Log Sink to a Cloud Storage bucket (with lifecycle rules and optionally Bucket Lock) or to a BigQuery dataset (with partition expiration matching your retention requirement).

Introduction to Managing Data Lifecycle Policies

Managing data lifecycle policies on Google Cloud is the discipline of deciding, in advance, when data should be moved to a cheaper storage class, when it should be deleted, and when it absolutely must not be touched. The Professional Data Engineer exam treats this as a first-class skill because every storage product on GCP — Cloud Storage, BigQuery, Bigtable, Spanner, Pub/Sub, even Cloud Audit Logs — exposes its own retention dial, and the wrong setting either burns money or breaks compliance.

This study note walks through every retention surface a data engineer touches in production, including the regulatory traps around GDPR right-to-be-forgotten and the immutability story of Bucket Lock. The goal is to give you a working mental model so that when an exam question describes a seven-year SOX retention requirement on a 50 TB dataset, you already know which knob to turn.

白話文解釋（Plain English Explanation）

Before we open the documentation, three analogies. Each one captures a different facet of managing data lifecycle policies — cost tiering, immutable retention, and time-based deletion.

Think of it like a kitchen fridge, freezer, and pantry

Fresh food sits on the front shelf of the fridge because you eat it today. Tomorrow's leftovers move to the back. Bulk meat goes into the freezer for months. Canned goods live in the pantry for years. You do not throw out beans just because you bought them in 2024 — but you also do not keep last week's salad.

Cloud Storage works the same way. Standard class is the front shelf: hot, fast, expensive per GB-month but free to read. Nearline is the back of the fridge for monthly access. Coldline is the freezer for quarterly access. Archive is the pantry for yearly compliance reads. An object lifecycle policy is the rule that says "after 30 days on the shelf, move it to the back; after 90 days, freeze it; after 7 years, throw it away." You write the rule once and Cloud Storage executes it on every object, every day, with no Cloud Function required.

Think of it like a bank safety deposit box with a court order

When you put documents in a bank safety deposit box, you can come back any time and take them out. But if a court orders the box sealed for the duration of an investigation, the bank physically refuses to open it — even if you, the owner, beg them to. The seal is non-negotiable, time-bound, and audited.

Bucket Lock and BigQuery table snapshots with retention work the same way. Once you lock a retention policy at "seven years," nobody — not the bucket owner, not the project owner, not even the organization admin — can delete those objects until the clock runs out. This is the mechanism auditors want to see for SOX, HIPAA, FINRA, and SEC Rule 17a-4. The cost of getting it wrong is not just a bug; it is a fine.

Think of it like a self-cleaning whiteboard

Imagine a conference room whiteboard that automatically erases anything older than 24 hours. You write today's standup notes; tomorrow they are gone. You never have to remember to wipe it. If somebody insists on keeping a sketch, they have to take a photo — the whiteboard itself will not preserve it.

Pub/Sub message retention, BigQuery partition expiration, and Bigtable garbage collection all behave like this whiteboard. Pub/Sub holds unacknowledged messages for up to 7 days by default and then drops them. BigQuery partitions configured with partitionExpirationMs vanish on their birthday. Bigtable garbage collection sweeps cells older than the configured max-age or beyond the max-versions cap. The infrastructure does the cleanup; your application just has to know it will happen.

Core Concepts of Managing Data Lifecycle Policies

A few vocabulary items recur across every GCP storage product, and the exam loves to mix them up.

Time-to-live (TTL) is the wall-clock duration after which a piece of data becomes eligible for deletion. Pub/Sub, Bigtable, BigQuery partitions, and Memorystore all use TTL semantics, though they spell it differently.

Storage class transition is the act of rewriting an object's metadata so that it is billed at a different per-GB rate. Cloud Storage transitions are conceptually free of egress because the bytes never leave the region, but they do incur a Class A operation charge per object.

Retention policy is a floor, not a ceiling. It says "this object cannot be deleted before date X." It does not say "delete it on date X." A retention policy combined with a lifecycle delete rule is how you express "keep for exactly seven years, then erase."

Hold is a per-object override that pauses deletion regardless of any retention policy. Event-based holds and temporary holds are the two flavors. Holds are the GCP equivalent of a litigation hold in legal e-discovery.

Soft delete is a relatively recent Cloud Storage feature that automatically retains deleted objects for a configurable window (default 7 days, up to 90 days) so that accidental deletions can be recovered. It runs in parallel with lifecycle rules.

Garbage collection in Bigtable is the background process that physically removes cells that have exceeded their max-age or max-versions configuration. Crucially, GC is eventual; reads may still see soon-to-be-deleted cells until compaction runs.

Architecture and Design Patterns

Production data platforms rarely use one retention rule. They layer several to express business intent precisely.

The hot-warm-cold-frozen pattern is the canonical one. New events land in BigQuery (or Cloud Storage Standard) for 30 to 90 days of analytics. After that, BigQuery long-term storage pricing kicks in automatically (no action required) and the price-per-GB drops by 50 percent for any partition not modified in 90 days. Mirror copies in Cloud Storage transition Standard to Nearline at 30 days, Nearline to Coldline at 365 days, Coldline to Archive at 1095 days, and finally delete at 2555 days (seven years). The same byte changes price five times across its lifetime without anybody writing code.

The WORM compliance pattern (Write Once Read Many) layers Bucket Lock on top of a Standard or Archive bucket. Lifecycle rules can still transition objects, but no human or service account can delete them. Combined with Object Versioning and a noncurrent-version-expiration rule, you get tamper-evident retention that satisfies most financial regulators.

The right-to-be-forgotten pattern is harder. GDPR Article 17 requires that personal data be erased on request, typically within 30 days, across primary stores, backups, analytics warehouses, and logs. The clean way to architect this is to keep PII in a single dedicated Spanner or BigQuery table keyed by a stable user ID, and store everything else by ID reference only. Erasure becomes a single DELETE plus a backup-rotation guarantee, instead of a scavenger hunt across 40 datasets.

The partition-then-expire pattern is BigQuery-specific. Partition the table by ingestion date or event date, set partitionExpirationMs to the retention window in milliseconds, and BigQuery silently drops whole partitions on schedule. You never run a DELETE, which is important because DELETE on BigQuery has DML quotas and costs scanned bytes, while partition expiration is free.

A retention policy on Cloud Storage is independent of the lifecycle rule that deletes objects. The retention policy says "you may not delete before X." The lifecycle rule says "delete after Y." Both must be configured to express "keep exactly N days then erase." Forgetting the lifecycle rule means objects stay forever and your bill grows. See https://cloud.google.com/storage/docs/bucket-lock for the canonical explanation.

GCP Service Deep Dive

Each storage product has its own retention surface, with its own gotchas. The exam tests fluency across all of them.

Cloud Storage Object Lifecycle Management

Cloud Storage lifecycle rules are JSON documents attached to a bucket. Each rule has a condition and an action. Conditions can match age (days since object creation), createdBefore (an absolute date), numNewerVersions, matchesStorageClass, matchesPrefix, matchesSuffix, daysSinceCustomTime, daysSinceNoncurrentTime, and customTimeBefore. Actions are either Delete or SetStorageClass.

A typical cost-optimization configuration looks like this:

{
  "lifecycle": {
    "rule": [
      {
        "action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
        "condition": {"age": 30, "matchesStorageClass": ["STANDARD"]}
      },
      {
        "action": {"type": "SetStorageClass", "storageClass": "COLDLINE"},
        "condition": {"age": 90, "matchesStorageClass": ["NEARLINE"]}
      },
      {
        "action": {"type": "SetStorageClass", "storageClass": "ARCHIVE"},
        "condition": {"age": 365, "matchesStorageClass": ["COLDLINE"]}
      },
      {
        "action": {"type": "Delete"},
        "condition": {"age": 2555}
      }
    ]
  }
}

Lifecycle rules run asynchronously. Google does not guarantee that an object is transitioned the moment it ages out — only that it will be processed within roughly 24 hours. Bills are calculated on the actual storage class at the time of metering, so a one-day delay can cost a few cents on a petabyte-scale bucket.

The four storage classes have minimum storage durations: Standard has none, Nearline 30 days, Coldline 90 days, Archive 365 days. Deleting or transitioning an object before its minimum duration triggers an early-deletion fee equal to the storage cost for the remaining days. This is why a rule that transitions Standard to Coldline at 7 days is almost always wrong on the exam — you are paying for the 90-day minimum anyway, so just go to Standard for 90 days and then Coldline.

BigQuery Table, Dataset, and Partition Expiration

BigQuery exposes three layers of expiration. Dataset default table expiration sets a TTL applied to every new table created in the dataset. Table expiration is per-table and overrides the dataset default. Partition expiration is per-partition and is the most useful for time-series workloads.

-- Create a partitioned table that auto-deletes partitions after 90 days
CREATE TABLE analytics.events (
  event_id STRING,
  user_id STRING,
  event_time TIMESTAMP,
  payload JSON
)
PARTITION BY DATE(event_time)
OPTIONS (
  partition_expiration_days = 90,
  require_partition_filter = TRUE
);

The partition expiration clock starts at the partition boundary, not at row insert time. A partition for 2026-01-01 configured with 90-day expiration becomes eligible for deletion on 2026-04-01, regardless of when the rows were actually loaded. This matters when backfilling historical data — loading old data into an expired partition immediately makes it eligible for deletion.

BigQuery also has long-term storage, which is automatic. Any partition not modified for 90 consecutive days drops from the active storage rate (about $0.02 per GB-month) to the long-term rate (about $0.01 per GB-month). No configuration, no transition fee. Reads do not reset the timer; only writes do. The exam loves to ask whether streaming inserts into a partition reset the long-term clock — they do, because they modify the partition.

Bigtable Garbage Collection

Bigtable retains every cell version unless told otherwise. Garbage collection policies are configured per column family and have two modes that can be combined.

Max age keeps a cell only if its timestamp is within the configured duration of the current time. maxAge=7d keeps the last week of data per cell.

Max versions keeps only the N newest cell versions. maxVersions=1 makes the column family behave like a traditional key-value store.

You can union or intersect the two with Union(maxAge=30d, maxVersions=10) (keep cells that satisfy either) or the intersection (keep cells that satisfy both). Garbage collection runs during compactions, which are scheduled by Bigtable, not on demand. Cells eligible for collection may still be readable for hours until compaction sweeps them. For deterministic deletion, use a Bigtable client API delete call instead of relying on GC.

Bigtable garbage collection is eventual, not immediate. A cell whose maxAge expired five minutes ago may still be returned by a read if compaction has not yet run. Do not rely on GC for compliance-grade deletion. For GDPR or legal erasure, issue an explicit DeleteFromRow mutation. See https://cloud.google.com/bigtable/docs/garbage-collection for compaction semantics.

Spanner Backup Retention

Cloud Spanner backups have a configurable expiration time set at backup creation, with a maximum of one year. Backups are stored separately from the database and continue to incur charges until they expire. There is no built-in tiering — Spanner backups are billed at a single rate per GB-month.

For longer retention, export to Cloud Storage with gcloud spanner databases export and apply a lifecycle policy to the destination bucket. This is the standard pattern for SOX-grade seven-year retention of Spanner data.

Point-in-time recovery (PITR) is a separate retention surface. Spanner can be configured to retain a continuous version history from 1 hour up to 7 days, allowing stale reads at any timestamp in that window. PITR retention is independent of backup retention and adds storage cost proportional to the write rate of the database.

Cloud Storage Coldline and Archive

Coldline and Archive are functionally identical to Standard for read and write APIs — same latency, same throughput, same consistency. The differences are pricing and minimum storage duration.

Class	Storage $/GB-mo	Retrieval $/GB	Min. duration
Standard	~$0.020	$0.000	none
Nearline	~$0.010	$0.010	30 days
Coldline	~$0.004	$0.020	90 days
Archive	~$0.0012	$0.050	365 days

Archive class is designed for data you read at most once a year. Reading 1 TB from Archive costs about $50 in retrieval fees plus standard egress if you pull it out of GCP. The math matters: if you retrieve more than a few percent of an Archive bucket per year, Coldline or even Nearline ends up cheaper.

Bucket Lock and Retention Policies

A retention policy on a Cloud Storage bucket sets a minimum age before any object can be deleted or overwritten. Setting a policy is reversible — you can lower or remove it. Locking the policy via gsutil retention lock makes it permanent: it can never be reduced or removed, only extended. This is the immutability guarantee auditors want.

# Set a 7-year retention policy (in seconds)
gsutil retention set 220752000s gs://compliance-bucket

# Lock it permanently (irreversible)
gsutil retention lock gs://compliance-bucket

Once locked, the only way to remove an object before its retention age is to delete the entire bucket — and even that requires that every object in the bucket has aged past its retention period. There is no escape hatch, no organization-admin override, no support ticket. Treat a locked retention policy with the same respect as a rm -rf in production.

Locking a retention policy is irreversible. Test your retention duration on a non-production bucket first. A 100-year locked policy on a production bucket means you are paying storage for 100 years, full stop. Refer to https://cloud.google.com/storage/docs/bucket-lock before running retention lock in any environment.

GDPR Article 17 grants data subjects the right to demand erasure of their personal data, typically within 30 days. On GCP, a defensible workflow looks like this:

Centralize PII: keep personally identifiable fields in one BigQuery table or Spanner table keyed by a stable pseudonymous user ID. Everywhere else in the data platform, reference the user only by ID. This converts a multi-system erasure into a single-table DELETE.
Identify data: when a request arrives, look up the user by email or phone in the PII table and capture the pseudonymous ID. Use that ID to enumerate downstream artifacts: export jobs, model training datasets, BigQuery tables, Pub/Sub backlog, Cloud Storage objects.
Delete primary records: issue DELETE statements against BigQuery and Spanner. For Cloud Storage objects, list and delete by metadata prefix.
Handle backups: backups are the hardest part. Two viable strategies are time-bound backup retention (e.g., 30-day backup window, so erased data falls out of backups within the GDPR response window) and crypto-shredding (see below).
Crypto-shredding: encrypt each user's data with a per-user CMEK key. Erasing the user becomes "destroy the key version in Cloud KMS," which renders all ciphertext, including ciphertext on backup tapes, unrecoverable. This is the only practical way to erase data from immutable backups.
Audit the erasure: log the deletion in a tamper-evident audit trail (Cloud Audit Logs with Bucket Lock on the log-export bucket).

For GDPR-heavy workloads, design with crypto-shredding from day one. Provision a per-tenant CMEK key in Cloud KMS, encrypt all of that tenant's BigQuery tables and Cloud Storage objects with it, and retain the key only as long as the tenant relationship lasts. Erasure becomes a key-destroy operation that propagates instantly to every backup. See https://cloud.google.com/kms/docs/key-management-service for KMS lifecycle.

Pub/Sub Message Retention

Pub/Sub has two independent retention windows. Subscription retention controls how long unacknowledged messages are kept on a subscription, configurable from 10 minutes to 7 days (default 7 days). Acknowledged messages are dropped immediately unless retain_acked_messages is set, in which case they remain for the same window and can be replayed via seek.

Topic retention is configurable from 10 minutes to 31 days. When set, Pub/Sub stores all published messages at the topic level, allowing any subscription — including subscriptions created after the message was published — to replay the backlog up to the topic retention duration. Topic retention is the GCP equivalent of Kafka's retention model.

Topic retention adds storage cost. A high-volume topic (say, 10 MB/s) with 7-day retention stores roughly 6 TB of messages, billed at the message-storage rate.

Cloud Audit Logs Retention

Cloud Audit Logs come in four flavors with different default retention periods. Admin Activity logs and System Event logs are retained for 400 days at no cost. Data Access logs and Policy Denied logs are retained for 30 days at no cost.

For longer retention, route logs to a sink: a Cloud Storage bucket (cheapest, supports lifecycle rules), a BigQuery dataset (queryable, supports partition expiration), or a Pub/Sub topic (for streaming to a SIEM). The sink-bucket pattern with Bucket Lock is the standard for SOX seven-year retention of Admin Activity logs.

A Log Sink is a Cloud Logging configuration that exports matching log entries to an external destination — Cloud Storage, BigQuery, Pub/Sub, or another Cloud Logging bucket — for long-term retention, analysis, or alerting. Sinks use a filter expression to select which logs to export.

Common Pitfalls and Trade-offs

Several lifecycle traps come up repeatedly in production postmortems and on the exam.

The early-deletion fee surprise: transitioning Standard to Nearline at 5 days, then Coldline at 25 days, charges you the full 30-day Nearline minimum even though the object only stayed there 20 days. Always count out the minimums before stacking transitions.

Soft delete double-billing: Cloud Storage soft delete keeps deleted objects for 7 days by default. If your workload churns through millions of short-lived files, you may be paying for 8x the storage you think you have. Disable soft delete or shorten the window for high-churn buckets.

Retention vs. lifecycle confusion: a 7-year locked retention policy on a bucket without a delete lifecycle rule means the data lives forever. The retention policy only sets the floor; you still need an action to delete on schedule.

BigQuery DELETE quotas: trying to "expire" old data with DELETE FROM table WHERE event_time < ... runs into BigQuery DML quotas (1,500 DML statements per table per day), scans the entire table, and incurs query cost. Partition expiration is free, instant, and unmetered. Always partition for time-series.

Bigtable GC eventual semantics: GC sweeps are tied to compactions, which Bigtable schedules opportunistically. Reads can return cells that should have been collected an hour ago. For compliance, use explicit deletes.

Pub/Sub message loss on missed deadlines: a subscription with the default 7-day retention will drop unacknowledged messages on day 8. If a downstream consumer has been down longer than that, you have data loss. Monitor oldest_unacked_message_age aggressively.

Cross-region backup retention: Spanner backups are region-bound. If your region experiences a multi-zone outage, your backups in that region may be unreachable. Use Spanner backup copying or Cloud Storage exports for cross-region redundancy.

Setting a Cloud Storage retention policy of 100 years on a bucket and then locking it is unrecoverable. The bucket and every object in it must persist for a century, and you will pay storage costs for that century. Always test retention durations on a throwaway bucket with retention set (without lock) first. https://cloud.google.com/storage/docs/bucket-lock#considerations

Best Practices

A short list of habits that keep lifecycle policies healthy in production:

Partition every BigQuery time-series table by ingestion date or event date and set partitionExpirationMs from day one. Retrofitting partitioning later is painful.
Use customTime metadata on Cloud Storage objects to drive transitions based on a business event (contract end date, fiscal year close) rather than upload time.
Audit lifecycle configurations quarterly with gsutil lifecycle get across every bucket in the org. Drift accumulates faster than you expect.
For compliance-grade retention, lock the policy and document the unlock-impossibility in your runbook so on-call engineers do not waste hours trying.
Pair every retention policy with a corresponding delete lifecycle rule. Retention without deletion is a one-way ratchet on your storage bill.
Use crypto-shredding for tenants subject to GDPR, CCPA, or similar erasure regimes. It is the only mechanism that survives immutable backups.
Export Audit Logs to a Cloud Storage sink with Bucket Lock for SOX, HIPAA, and PCI-DSS audit trails. The 400-day default is rarely enough.
Tag your buckets, datasets, and instances with a data-classification label so cost-allocation and compliance reports can filter on them.

Real-World Use Case

A mid-size European fintech with about 800 employees runs a transaction-processing platform on GCP. Their data engineering team manages roughly 80 TB of raw event data per month flowing through Pub/Sub into Dataflow into BigQuery, with parallel writes to Cloud Storage for archival.

Their lifecycle architecture has four layers.

Hot analytics: the last 90 days of transactions live in BigQuery partitioned by transaction date with partitionExpirationMs = 7776000000 (90 days). Analysts query these partitions for fraud detection and weekly reporting.

Warm archive: every Dataflow job mirrors the same data into a Cloud Storage bucket in Standard class. A lifecycle rule transitions Standard to Nearline at 30 days, Nearline to Coldline at 365 days, Coldline to Archive at 1095 days, and deletes at 2555 days (seven years, the GDPR-PSD2-AML composite requirement for transaction records in the EU).

Compliance vault: the same bucket has a locked retention policy of 2555 days. Even the platform-team admins cannot delete a transaction record before its seventh birthday, satisfying their auditors and the local financial regulator.

GDPR erasure: PII (name, email, national ID) is stored in a separate Spanner table keyed by a pseudonymous customer ID, encrypted with a per-customer CMEK key in Cloud KMS. Transaction records reference the customer only by pseudonymous ID. When an erasure request arrives, the team destroys the customer's CMEK key version, which renders the PII permanently unreadable while leaving the (pseudonymous) transaction records intact and audit-compliant. The whole erasure runbook takes about 20 minutes per request and is fully documented for regulator inspection.

This architecture costs about 38 percent of what a "keep everything in BigQuery Standard" setup would, while passing both the regulator's seven-year retention test and GDPR's 30-day erasure deadline.

Exam Tips

The PDE exam tests lifecycle policies in scenario form. A few patterns worth memorizing:

If a question mentions "must not be deletable for N years," the answer involves Cloud Storage Bucket Lock with a locked retention policy. Lifecycle rules alone are not sufficient because they can be modified.

If a question mentions "minimize storage cost for data accessed less than once per year," the answer is Archive class. For "less than once per quarter" it is Coldline. For "less than once per month" it is Nearline.

If a question mentions "automatically delete old data in BigQuery," the answer is partition expiration, not a scheduled DELETE query. The exam considers DML for retention an anti-pattern.

If a question mentions "GDPR right to be forgotten across backups," the answer is crypto-shredding via per-user CMEK keys, not iterating through backup tapes.

If a question mentions "replay messages published before subscription was created," the answer is topic-level message retention combined with seek. Subscription-only retention does not cover messages published before the subscription existed.

If a question describes Bigtable cells lingering after their max-age expired, the cause is eventual garbage collection running during compactions; this is expected, not a bug.

If a question mentions "audit log retention longer than 400 days," the answer is a Log Sink to Cloud Storage with a lifecycle rule and Bucket Lock for tamper evidence.

Cloud Storage minimum storage durations: Standard 0 days, Nearline 30 days, Coldline 90 days, Archive 365 days. Transitioning or deleting before the minimum incurs an early-deletion fee equal to the remaining storage cost. Memorize this table — the exam will give you a transition schedule and ask whether it incurs penalties. https://cloud.google.com/storage/docs/storage-classes

Frequently Asked Questions (FAQ)

What is the difference between a Cloud Storage retention policy and a lifecycle delete rule?

A retention policy sets a minimum age before objects can be deleted; it is a floor. A lifecycle delete rule actually removes objects after a specified age; it is the action. To express "keep for exactly 7 years then erase," you need both: a 7-year retention policy (locked, if compliance requires immutability) and a lifecycle rule with action Delete at age 2555 days. Using only the retention policy means objects live forever.

Does reading data from a BigQuery long-term storage partition reset its long-term timer?

No. Only writes (INSERT, UPDATE, DELETE, MERGE, streaming inserts, load jobs targeting the partition, copy jobs) reset the 90-day modification clock. Pure reads, including SELECT queries and BI tool dashboards, never affect long-term pricing. This is why long-term storage is risk-free to enable — analytics workloads pay the lower rate without any architectural changes.

Can I delete a Cloud Storage bucket with a locked retention policy?

Only if every object in the bucket has aged past the locked retention duration. If even one object is still under retention, the bucket cannot be deleted. There is no admin override, no support escalation that removes the lock, and no API to shorten or remove a locked policy. The lock is permanent by design — that is the property auditors require.

How does Bigtable garbage collection differ from explicit DELETE mutations?

Garbage collection is eventual and runs opportunistically during compactions. A cell whose max-age expired may still be readable for minutes or hours afterward. An explicit DELETE mutation issued via the Bigtable API takes effect immediately for new reads (subject to read-your-writes semantics within a row). For compliance-grade deletion (GDPR, CCPA), use explicit deletes; for cost-driven cleanup, garbage collection is fine.

What happens to Pub/Sub messages that exceed the subscription retention window?

They are dropped permanently. A subscription with the default 7-day retention drops any unacknowledged message on day 8, and there is no way to recover it. To prevent message loss during prolonged consumer outages, enable topic-level message retention (up to 31 days) which preserves messages independently of subscription state and allows replay via seek to a timestamp within the topic retention window.

Use crypto-shredding. Encrypt each user's personal data with a dedicated per-user encryption key managed in Cloud KMS. When an erasure request arrives, destroy the key version in KMS. The ciphertext on disk and on every backup remains, but it is mathematically unrecoverable without the key. This is the only practical pattern for erasing data from append-only backup systems and is recognized by EU data protection authorities as adequate for Article 17 compliance.

Are Cloud Audit Logs deleted automatically when the retention period ends?

Yes, by default. Admin Activity and System Event logs are retained for 400 days; Data Access and Policy Denied logs for 30 days. After that, log entries are deleted from the default _Default Cloud Logging bucket. To retain longer, configure a Log Sink to a Cloud Storage bucket (with lifecycle rules and optionally Bucket Lock) or to a BigQuery dataset (with partition expiration matching your retention requirement).

Cloud Storage Data Lake Design — bucket layout and partitioning patterns that interact with lifecycle rules.
Cost Optimization Architectures — broader cost-management patterns including storage tiering economics.
Data Sovereignty and Compliance Design — region pinning, Bucket Lock, and CMEK patterns for regulated workloads.

Introduction to Managing Data Lifecycle Policies

白話文解釋（Plain English Explanation）

Think of it like a kitchen fridge, freezer, and pantry

Think of it like a bank safety deposit box with a court order

Think of it like a self-cleaning whiteboard

Core Concepts of Managing Data Lifecycle Policies

Architecture and Design Patterns

GCP Service Deep Dive

Cloud Storage Object Lifecycle Management

BigQuery Table, Dataset, and Partition Expiration

Bigtable Garbage Collection

Spanner Backup Retention

Cloud Storage Coldline and Archive

Bucket Lock and Retention Policies

Pub/Sub Message Retention

Cloud Audit Logs Retention

Common Pitfalls and Trade-offs

Best Practices

Real-World Use Case

Exam Tips

Frequently Asked Questions (FAQ)

What is the difference between a Cloud Storage retention policy and a lifecycle delete rule?

Does reading data from a BigQuery long-term storage partition reset its long-term timer?

Can I delete a Cloud Storage bucket with a locked retention policy?

How does Bigtable garbage collection differ from explicit DELETE mutations?

What happens to Pub/Sub messages that exceed the subscription retention window?

Are Cloud Audit Logs deleted automatically when the retention period ends?

Further Reading

Official sources

More PDE topics

Introduction to Managing Data Lifecycle Policies

白話文解釋（Plain English Explanation）

Think of it like a kitchen fridge, freezer, and pantry

Think of it like a bank safety deposit box with a court order

Think of it like a self-cleaning whiteboard

Core Concepts of Managing Data Lifecycle Policies

Architecture and Design Patterns

GCP Service Deep Dive

Cloud Storage Object Lifecycle Management

BigQuery Table, Dataset, and Partition Expiration

Bigtable Garbage Collection

Spanner Backup Retention

Cloud Storage Coldline and Archive

Bucket Lock and Retention Policies

GDPR Right-to-be-Forgotten Workflow

Pub/Sub Message Retention

Cloud Audit Logs Retention

Common Pitfalls and Trade-offs

Best Practices

Real-World Use Case

Exam Tips

Frequently Asked Questions (FAQ)

What is the difference between a Cloud Storage retention policy and a lifecycle delete rule?

Does reading data from a BigQuery long-term storage partition reset its long-term timer?

Can I delete a Cloud Storage bucket with a locked retention policy?

How does Bigtable garbage collection differ from explicit DELETE mutations?

What happens to Pub/Sub messages that exceed the subscription retention window?

How do I implement GDPR right-to-be-forgotten when backups are immutable?

Are Cloud Audit Logs deleted automatically when the retention period ends?

Related Topics

Further Reading

Official sources

More PDE topics