examlab .net The most efficient path to the most valuable certifications.
In this note ≈ 20 min

S3 Storage Classes, Lifecycle Policies, and Data Lake Foundations

4,000 words · ≈ 20 min read ·

DEA-C01 Domain 2 Tasks 2.1/2.3 S3 storage classes + lifecycle: Standard vs IA vs Intelligent-Tiering vs Glacier Instant/Flexible/Deep Archive trade-offs, lifecycle transition min-storage-duration rules, versioning + replication prereqs, access points, event notifications, lifecycle traps.

Do 20 practice questions → Free · No signup · DEA-C01

Amazon S3 is the foundational storage layer underneath every AWS data architecture, and on the DEA-C01 exam it shows up across Domains 1, 2, and 3 in scenarios that hinge on choosing the right storage class, designing the right lifecycle policy, and configuring the right replication and event-notification wiring. Community study guides from Tutorials Dojo, Digital Cloud Training, and ExamCert.App all flag the same pain point: candidates pick the cheapest storage class without checking the minimum-storage-duration rules, configure Intelligent-Tiering for tiny files and pay more in monitoring fees than they save in storage, and conflate Glacier Instant Retrieval with the older Glacier Flexible and Deep Archive classes that still need hours-long thaw operations. The wrong choice on the exam is the wrong choice in production: pick One Zone-IA for replicated backup and lose the data when an Availability Zone fails, pick Glacier Deep Archive for monthly compliance reports and wait twelve hours every time auditors ask for a file.

This guide is built for the data engineer perspective. It covers what S3 storage classes are, why the lifecycle engine exists, the seven storage classes and their cost and retrieval profiles, the lifecycle transition rules and minimum-storage-duration math, versioning and replication prerequisites, event notifications wiring into EventBridge and Lambda, S3 Access Points for shared data lake buckets, Object Lock for WORM compliance, S3 Select for query-time filtering, and the canonical S3 lifecycle exam traps that catch most candidates. By the end the storage-class decision tree should feel as natural as choosing a pantry shelf for ingredients you cook every day versus a basement freezer for ingredients you might never thaw.

What Are S3 Storage Classes And Why The Lifecycle Engine Exists

S3 storage classes are seven distinct backend storage tiers with different cost, durability, availability, and retrieval-latency trade-offs. Every object in S3 lives in exactly one storage class at any moment, and the storage class determines what the object costs per gigabyte per month, what it costs to retrieve, and how fast it can be retrieved. The lifecycle engine is the rule-based scheduler that automatically moves objects between storage classes and eventually expires them based on age. Together they are the cost-control layer of every S3-backed data lake.

Why Lifecycle Matters For Data Engineers

A typical raw-data S3 bucket grows by terabytes per day and most of those terabytes are queried heavily in the first week, occasionally in the next month, and almost never after ninety days. Storing all of it in S3 Standard at twenty-three dollars per terabyte per month is wasteful when ninety-day-old data could sit in Glacier Deep Archive at one dollar per terabyte per month. Lifecycle policies do the moving automatically — write the rule once, S3 enforces it forever, and the data engineer's storage bill drops by an order of magnitude without any application changes.

The Seven-Class Hierarchy

The storage classes in increasing access-latency order are S3 Standard (millisecond access, highest cost), S3 Intelligent-Tiering (millisecond access for hot data, automatic tiering), S3 Standard-Infrequent Access and S3 One Zone-Infrequent Access (millisecond access, retrieval fee, lower storage cost), S3 Glacier Instant Retrieval (millisecond access, even lower cost, higher retrieval fee), S3 Glacier Flexible Retrieval (minutes-to-hours access, archival cost), and S3 Glacier Deep Archive (twelve-hour access, deepest archive cost). Lifecycle policies can transition objects through these classes in order — but the order matters and skipping classes is sometimes restricted, which is the source of the exam's favorite trap.

Plain-Language Explanation: S3 Storage Classes And Lifecycle

The storage-class decision is not intuitive from the names alone. Three concrete analogies make the cost-vs-latency trade-off stick.

Analogy 1 — The Restaurant Pantry, Walk-In Fridge, and Basement Freezer

Picture a restaurant kitchen with three storage areas. The line pantry at arm's reach holds today's mise en place — chopped onions, sauces in squeeze bottles, ingredients used dozens of times per shift. Cost per cubic foot is highest because the line pantry is climate-controlled prime real estate, but access is instant. The walk-in fridge behind the kitchen holds tomorrow's ingredients and any prep that did not get used today — slower to access (the chef must walk back), cheaper per cubic foot. The basement freezer holds emergency stock and seasonal ingredients that are used a few times a year — cheapest per cubic foot, but it takes the dishwasher twenty minutes to thaw a frozen stock block before the chef can use it.

The line pantry is S3 Standard — instant access, premium price, used for hot data. The walk-in fridge is S3 Standard-IA or Glacier Instant Retrieval — still instant when you reach for it, lower base cost, but a small retrieval fee per access. The basement freezer is Glacier Flexible Retrieval or Deep Archive — cheapest per gigabyte but you wait minutes to hours to thaw the data before you can use it. A smart kitchen rotates ingredients from line to walk-in to freezer based on how often they are used, exactly the way an S3 lifecycle policy rotates objects from Standard to IA to Glacier based on object age.

Analogy 2 — The Library Reading Room, Stacks, and Off-Site Archive

Picture a research library. The reading room at the front holds the current periodicals and most-borrowed books — instant access, expensive shelf space, supports dozens of patrons per hour. The stacks in the basement hold the past five years of journals and monographs — slower (you fill out a request slip and a librarian retrieves), cheaper per square foot. The off-site archive holds anything older than five years — kept in a temperature-controlled warehouse twenty miles away, retrieval takes a day, cheapest storage rate.

The reading room is S3 Standard, the stacks are S3 Standard-IA, and the off-site archive is S3 Glacier Deep Archive. The library's collection-development policy is the lifecycle policy: new books go to the reading room, after a year they move to the stacks, after five years they move to the off-site archive, after twenty-five years they are weeded from the catalog (lifecycle expiration). A patron who needs a fifty-year-old paper waits a day; a patron who needs today's New York Times grabs it from the front rack.

Analogy 3 — The Bank Safe Deposit System

Picture a bank with three vaults. The teller drawer holds the day's working cash — instant access, used hundreds of times per shift, monitored constantly. The branch vault holds the branch's reserve cash — accessible in minutes when the teller drawer needs replenishment, costs less to secure per dollar held. The regional vault holds long-term reserves — armored truck transport required to move money in or out, takes a day to retrieve, cheapest per dollar to maintain.

The teller drawer is S3 Standard, the branch vault is Standard-IA or Glacier Instant Retrieval, and the regional vault is Glacier Flexible or Deep Archive. The minimum-storage-duration rule maps naturally: the bank charges a penalty if you remove cash from the regional vault within the first ninety days because the armored-truck dispatch was already paid for, exactly the way S3 charges you the full thirty-day or ninety-day or one-hundred-eighty-day storage charge if you delete or transition an object before that minimum has elapsed in IA or Glacier classes. The penalty is not a bug; it is how AWS prices the underlying tape and disk hardware that those tiers run on.

The Seven S3 Storage Classes In Detail

Each storage class has a specific cost profile, durability guarantee, availability SLA, and use case the exam expects you to know.

S3 Standard

The default class, eleven nines of durability, four nines of availability across at least three Availability Zones, millisecond first-byte latency. No retrieval fee, no minimum storage duration, no minimum object size. Use for hot data, active data lake landing zones, and content that is read multiple times per month. The most expensive per-GB-month class, but free from retrieval charges and minimum-duration penalties.

S3 Intelligent-Tiering

A single storage class that automatically moves objects between four internal access tiers — Frequent, Infrequent (after thirty days no access), Archive Instant (after ninety days), and optional Archive Access and Deep Archive Access tiers. No retrieval fee in the Frequent and Infrequent tiers. Charges a small per-object monitoring fee. Designed for unpredictable access patterns where you cannot tell in advance how often an object will be read. The exam trap: the monitoring fee is per-object, so for buckets with millions of tiny objects (under 128 KB, which Intelligent-Tiering will not transition out of Frequent anyway) the monitoring fee can exceed the storage savings.

S3 Standard-Infrequent Access (Standard-IA)

Same eleven nines durability as Standard, three nines availability, millisecond access. Lower storage cost, but charges a per-GB retrieval fee. Minimum storage duration of thirty days — delete or transition before thirty days and you pay the full thirty-day storage anyway. Minimum billable object size of 128 KB — smaller objects are billed as if they were 128 KB. Use for backups and old logs that are queried occasionally but must be available instantly when needed.

S3 One Zone-Infrequent Access (One Zone-IA)

Same as Standard-IA except data is stored in a single Availability Zone instead of three. Twenty percent cheaper than Standard-IA, but if the AZ goes down or is destroyed, the data is lost. Use only for reproducible derived data — secondary copies, transcoded media, ephemeral analytics intermediates — never for primary copies or backup-of-record. Same thirty-day minimum and 128 KB minimum.

S3 Glacier Instant Retrieval

A class introduced in late 2021 that combines Glacier-tier storage cost (cheaper than Standard-IA) with millisecond retrieval latency (the same as Standard). Higher per-GB retrieval fee than Standard-IA. Minimum storage duration of ninety days. 128 KB minimum object size. Use for archival data that is queried rarely but must be available instantly when accessed — medical imaging archives, news media archives, regulatory document repositories.

S3 Glacier Flexible Retrieval

The classic Glacier class (renamed from "Glacier" in 2021). Three retrieval tiers: Expedited (one to five minutes, highest fee), Standard (three to five hours, default), and Bulk (five to twelve hours, cheapest, free for large quantities). Minimum storage duration ninety days. 40 KB minimum billable object size. Use for archives that can wait minutes to hours for retrieval and where the storage cost saving justifies the wait.

S3 Glacier Deep Archive

The deepest archive tier. Two retrieval tiers: Standard (twelve hours) and Bulk (forty-eight hours). Minimum storage duration of one hundred eighty days. Cheapest storage in S3, often less than one dollar per terabyte per month. Use for compliance archives that must be retained for seven to ten years and are practically never read — financial records, regulatory submissions, healthcare records past their active period.

S3 storage classes are seven backend tiers with distinct cost, durability, retrieval-latency, minimum-storage-duration, and minimum-billable-object-size profiles. The seven are Standard, Intelligent-Tiering, Standard-IA, One Zone-IA, Glacier Instant Retrieval, Glacier Flexible Retrieval, and Glacier Deep Archive. Picking the wrong class for the access pattern is one of the most expensive mistakes in data engineering — One Zone-IA for primary backups risks total data loss on AZ failure, Glacier Deep Archive for monthly reports forces twelve-hour wait every time the report is requested, and Intelligent-Tiering for tiny-file buckets can charge more in monitoring than it saves in storage. On DEA-C01 the exam plants a scenario describing access pattern, durability requirement, and retrieval-latency tolerance, and the correct answer is the cheapest class that satisfies all three constraints simultaneously.

Lifecycle Policies — Transitions, Expirations, and Abort Rules

Lifecycle policies are the rule-based engine that automates storage-class transitions and object deletion.

Lifecycle Rule Structure

A lifecycle rule has a filter (prefix, tag, object size), a list of transitions (after N days, move to class X), and optional expiration actions (after N days, delete the object). Rules apply to current versions, noncurrent versions (when versioning is enabled), and incomplete multipart uploads. A bucket can have up to one thousand lifecycle rules, but in practice most buckets have under ten.

Transition Rules — The 30-90-180 Math

Transitions from Standard to Standard-IA require the object to be at least thirty days old in Standard. Transitions from any class to Glacier Flexible or Deep Archive can happen at any age but the destination class enforces its own minimum storage duration. The classic exam trap: you cannot transition an object that has been in Standard-IA for less than thirty days to anything else without paying the thirty-day minimum charge anyway. Lifecycle policies enforce this automatically — a rule that says "after one day in Standard, transition to Standard-IA, then after one more day transition to Glacier" still pays the thirty-day Standard-IA storage charge for every object that takes that path.

Expiration Rules

Expiration deletes the object after N days from creation (or N days after becoming a noncurrent version when versioning is on). For non-versioned buckets the object is gone forever. For versioned buckets, expiration of the current version creates a delete marker — the noncurrent versions still exist and continue to incur cost until a separate noncurrent-version expiration rule deletes them.

Abort Incomplete Multipart Upload

Multipart uploads that are started but never completed leave behind unaccounted parts that incur storage charges and never appear in object listings. Every production bucket should have an "abort incomplete multipart upload after 7 days" lifecycle rule. The exam loves this — a question of the form "the bucket bill keeps growing but the object count is constant" is answered by configuring the abort multipart lifecycle action.

S3 lifecycle minimum-storage-duration rules are billed regardless of whether the object stayed in the class — Standard-IA charges thirty days even if you transition out after one day, Glacier Flexible charges ninety days, and Glacier Deep Archive charges one hundred eighty days. Engineers designing aggressive lifecycle policies frequently overlook this and end up with bills higher than if they had stayed in Standard. The rule of thumb: do not transition objects to IA classes until they are likely to live in that class for at least the minimum duration, and do not transition to Glacier Deep Archive until objects are unlikely to be read for at least six months. The DEA-C01 exam tests this with scenarios where a "cost-optimized" policy actually costs more than a naive Standard-only policy because of minimum-duration penalties.

S3 Intelligent-Tiering — The Automatic Tiering Class

Intelligent-Tiering is a special storage class because it does the lifecycle decision automatically.

How Intelligent-Tiering Works

When you write an object to Intelligent-Tiering, S3 places it in the Frequent Access tier. After thirty consecutive days without access, S3 moves it to Infrequent Access (lower cost, no retrieval fee). After ninety days without access, S3 moves it to Archive Instant Access (lower still). Optional Archive Access and Deep Archive Access tiers add minutes-to-hours retrieval at deeper discounts, but those tiers must be explicitly opted into per bucket.

When To Use Intelligent-Tiering

Use Intelligent-Tiering when access patterns are unpredictable — you do not know whether an object will be hot or cold next month. The class makes the decision for you and never charges retrieval fees in the millisecond-access tiers. Stop using it when access patterns are known: predictable hot data belongs in Standard (no monitoring fee), predictable cold data belongs in Glacier (lower per-GB cost without per-object monitoring overhead).

The Monitoring Fee Trap

Intelligent-Tiering charges a per-object monitoring fee. For buckets with millions of small objects (under 128 KB, which Intelligent-Tiering will not even transition out of Frequent because the eligible-object size threshold is 128 KB), the monitoring fee can exceed any storage savings. The exam asks: "a bucket with one billion 50-KB objects is in Intelligent-Tiering and the bill is unexpectedly high — what should the data engineer do?" The right answer is to move the small objects out of Intelligent-Tiering, either to Standard (no monitoring fee) or by aggregating the small files into larger objects through compaction.

S3 Versioning, Replication, and MFA Delete

Versioning and replication are two independent features that work together for data protection.

Versioning

Versioning stores every version of every object — overwrites and deletes do not actually remove data, they create a new version or a delete marker. Versioning is the prerequisite for replication, lifecycle rules on noncurrent versions, and most data-protection patterns. Once enabled, versioning can be suspended but not turned off — the existing versions remain forever unless explicitly deleted.

MFA Delete

MFA Delete is a versioning option that requires multi-factor authentication to permanently delete a version or to suspend versioning. It is configurable only via the bucket-owner root account using the AWS CLI — the console cannot set it. Use for buckets holding compliance-critical data where accidental or malicious deletion must be prevented.

Cross-Region Replication (CRR) and Same-Region Replication (SRR)

Replication asynchronously copies objects from a source bucket to a destination bucket — across regions (CRR) or within the same region (SRR). Both require versioning enabled on source and destination, an IAM role with replication permissions, and a replication configuration on the source bucket. CRR is for disaster recovery and geographic redundancy; SRR is for compliance separation, log aggregation, or replicating between accounts in the same region. Replication does not retroactively copy existing objects — only new writes after the replication rule is created replicate. Use S3 Batch Replication for the backfill.

S3 Replication Time Control (RTC)

For RPO-sensitive workloads, S3 RTC provides a fifteen-minute replication SLA with CloudWatch metrics. Costs more than standard replication. Use when the data engineer must demonstrate a hard recovery time objective for compliance.

S3 Event Notifications, Access Points, Object Lock, and S3 Select

Several S3 features power data engineering pipelines beyond the storage and lifecycle layer.

S3 Event Notifications

S3 can publish events on object creation, deletion, restore, and replication-failure to SQS queues, SNS topics, Lambda functions, or EventBridge. The most common pipeline trigger pattern: a producer drops a Parquet file in a landing prefix, S3 fires an s3:ObjectCreated:* event to EventBridge, EventBridge invokes a Step Functions workflow that runs a Glue job to transform and load the file. EventBridge integration is preferred over direct SQS or Lambda targets because EventBridge supports filtering, multiple targets per event, and cross-account routing.

S3 Access Points

Access Points are named network endpoints with their own access policy, attached to a bucket. They simplify per-application access management for large shared buckets — instead of one bucket policy that grants permissions for every consumer, each consumer gets its own access point with a focused policy. Multi-Region Access Points add automatic failover routing across regions for active-active workloads.

S3 Object Lock

Object Lock enables WORM (Write Once Read Many) protection on individual objects. Two retention modes exist: Governance (allows certain privileged users to override the lock) and Compliance (no one, including the root account, can delete or modify the object until the retention period expires). Use for regulatory compliance archives — SEC 17a-4, FINRA, HIPAA — where data must be tamper-proof for a defined retention period.

S3 Select And Glacier Select

S3 Select runs simple SQL filters against a single object (CSV, JSON, Parquet) at retrieval time and returns only the matching subset. Glacier Select does the same against Glacier Flexible objects. Use to reduce downstream compute when an application reads ten percent of rows from a one-gigabyte object — instead of pulling the whole gigabyte then filtering in Lambda, push the filter to S3. The exam contrasts S3 Select (single-object filter) with Athena (multi-object SQL with joins) — they solve different problems.

For data lake buckets where access patterns are unknown or change over time, use S3 Intelligent-Tiering as the default class with an explicit object-size filter to exclude objects under 128 KB. Intelligent-Tiering automatically moves objects to lower-cost tiers after thirty and ninety days of no access while preserving millisecond-latency reads when accessed, eliminating manual lifecycle planning. The 128-KB filter avoids the per-object monitoring fee on tiny files where the fee exceeds any savings. Combine with a separate lifecycle rule that transitions truly cold data (older than two years) to Glacier Deep Archive for long-term retention. This pattern delivers near-optimal cost without forecasting access patterns.

S3 As The Data Lake Foundation

Every modern AWS data architecture sits on S3 because S3 is the only AWS service that combines unlimited capacity, eleven nines of durability, regional availability, and integration with every analytics service.

Why S3 Underpins Every Data Architecture

S3 has no inherent compute coupling — you can store data once and query it from Athena, Redshift Spectrum, EMR Spark, Glue ETL, SageMaker, Quicksight, and any custom application without copying. This decoupling of storage from compute is what enables the data lake pattern, which is the foundation of every modern analytics architecture. Lock-in to a vendor-specific format like Redshift's internal blocks would prevent this; storing in open formats like Parquet on S3 enables it.

Bronze, Silver, Gold Zoning

A typical data lake organizes data into three S3 prefixes (or buckets): bronze for raw landing data exactly as ingested, silver for cleaned and partitioned data after first-pass ETL, and gold for aggregated business-ready data for analytics. Lifecycle policies differ per zone — bronze ages aggressively to Glacier within ninety days, silver stays in Standard-IA for a year, gold stays in Standard indefinitely.

S3 With Lake Formation Governance

Lake Formation registers S3 paths as governed locations and applies fine-grained access at the table, column, row, and cell level. Lake Formation does not replace bucket policies — both must allow access for an operation to succeed. The two-layer model is intentional: bucket policies enforce coarse network and account boundaries, Lake Formation enforces fine-grained data governance.

One Zone-IA stores data in only one Availability Zone, so a single AZ failure or destruction destroys the data permanently — never use One Zone-IA for primary copies, backups of record, or any data that cannot be regenerated from another source. The exam plants a scenario describing "we want the cheapest IA class for our backups" and lists One Zone-IA as the cheapest option. Choosing it is wrong because backups must survive AZ failure by definition. Use One Zone-IA only for derived secondary copies — transcoded media, analytics intermediates, cached datasets — that can be regenerated cheaply if the AZ is lost. The DEA-C01 exam treats this as a single-question disqualifier, and many candidates lose points on this exact distinction.

Common Exam Traps For S3 Storage Classes And Lifecycle

The DEA-C01 exam plants a consistent set of traps around S3. Memorize all six.

Trap 1 — Glacier Instant Retrieval Confused With Glacier Flexible

A scenario asks "we need data archive cheaper than Standard-IA but with millisecond access." Right answer: Glacier Instant Retrieval. Wrong answer: Glacier Flexible Retrieval, which is even cheaper but takes minutes to hours to retrieve.

Trap 2 — Aggressive Transition Increases Cost

A scenario describes a lifecycle policy that transitions objects through Standard, Standard-IA, Glacier Flexible, Glacier Deep Archive in rapid succession. Wrong intuition: this saves money. Right answer: minimum-storage-duration penalties at each tier mean the bill is higher than staying in Standard, because each tier charges its full minimum duration regardless.

Trap 3 — Intelligent-Tiering For Tiny Objects

A bucket has one billion 10-KB objects in Intelligent-Tiering. The bill is high because of the per-object monitoring fee. Right answer: aggregate the small objects, or move them out of Intelligent-Tiering entirely. The 128-KB threshold means tiny objects never transition out of Frequent, so the monitoring fee adds cost with no savings.

Trap 4 — One Zone-IA For Primary Backups

The wrong answer trap covered above. One Zone-IA is acceptable only for regenerable derived data, never for backups of record.

Trap 5 — Replication Without Versioning

A scenario asks "configure CRR for the bucket" and the candidate forgets that versioning must be enabled on both source and destination first. Replication configuration without versioning fails. The exam lists the prerequisites in order and the answer that omits versioning is wrong.

Trap 6 — Glacier Deep Archive Twelve-Hour Retrieval For Frequent Reports

A scenario asks "monthly compliance reports must be readable on demand and we want lowest cost." Wrong answer: Deep Archive (twelve-hour retrieval makes "on demand" impossible). Right answer: Glacier Instant Retrieval (millisecond access, lower cost than Standard-IA).

Trap 7 — Forgetting Abort Incomplete Multipart Upload

A bucket bill grows steadily but the visible object count is flat. The cause is incomplete multipart uploads accumulating in invisible state. The fix is a lifecycle rule with the abort-incomplete-multipart-upload action set to seven days.

S3 storage classes have minimum-storage-duration billing — Standard has none, IA classes (Standard-IA, One Zone-IA, Glacier Instant Retrieval) charge thirty or ninety days minimum, Glacier Flexible charges ninety days, Glacier Deep Archive charges one hundred eighty days. Memorize 0/30/90/180. Lifecycle transitions that move objects out of these classes before the minimum still pay the full minimum-duration storage charge. This is the single most-tested S3 cost concept on DEA-C01 — every "cost-optimized lifecycle policy" question hinges on whether the candidate knows the minimum-duration rules. Combine with the rule that Glacier Instant Retrieval is the only Glacier class with millisecond access (Flexible and Deep Archive are minutes-to-hours), and you can answer the majority of S3 storage-class scenarios on the exam.

Key Numbers And Must-Memorize S3 Facts

Storage Class Minimums

  • S3 Standard: no minimum duration, no minimum object size
  • S3 Intelligent-Tiering: no minimum duration, but per-object monitoring fee for objects 128 KB and larger
  • S3 Standard-IA, One Zone-IA: thirty-day minimum duration, 128 KB minimum billable size
  • S3 Glacier Instant Retrieval: ninety-day minimum duration, 128 KB minimum
  • S3 Glacier Flexible Retrieval: ninety-day minimum duration, 40 KB minimum
  • S3 Glacier Deep Archive: one hundred eighty-day minimum duration, 40 KB minimum

Retrieval Latency

  • Standard, Intelligent-Tiering, Standard-IA, One Zone-IA, Glacier Instant Retrieval: milliseconds
  • Glacier Flexible Retrieval: Expedited 1-5 minutes, Standard 3-5 hours, Bulk 5-12 hours
  • Glacier Deep Archive: Standard 12 hours, Bulk 48 hours

Durability And Availability

  • Eleven nines durability across all classes
  • Standard: four nines availability across at least three AZs
  • One Zone-IA: three nines availability in a single AZ

Lifecycle Policy Limits

  • Up to 1000 rules per bucket
  • Rules apply to current versions, noncurrent versions, and incomplete multipart uploads
  • Abort incomplete multipart upload action recommended in every production bucket

Versioning And Replication

  • Versioning is a bucket-level setting, prerequisite for replication
  • MFA Delete configurable only via root account CLI
  • CRR and SRR require versioning on both source and destination
  • Replication does not backfill existing objects — use S3 Batch Replication

DEA-C01 exam priority — S3 Storage Classes and Lifecycle Policies. This topic carries weight on the DEA-C01 exam. Master the trade-offs, decision boundaries, and the cost/performance triggers each AWS service exposes — the exam will test scenarios that hinge on knowing which service is the wrong answer, not just which is right.

FAQ — S3 Storage Classes And Lifecycle Top Questions

Q1 — How do I choose between Standard-IA, Glacier Instant Retrieval, and Intelligent-Tiering?

Use Standard-IA when you know access is infrequent (under once per month) but predictable, and millisecond access matters. Use Glacier Instant Retrieval when access is even rarer (a few times per year), still requires millisecond response, and the storage savings outweigh the higher per-GB retrieval fee. Use Intelligent-Tiering when access patterns are unknown or change over time — the class auto-tiers without retrieval fees in the millisecond-latency tiers, at the cost of a per-object monitoring fee. The decision matrix: predictable hot equals Standard, predictable cold-but-instant equals Standard-IA or Glacier Instant Retrieval, unpredictable equals Intelligent-Tiering. The DEA-C01 exam tests this with scenarios describing access patterns and asking for the cheapest class that meets the latency requirement.

Q2 — Why does my "aggressive" lifecycle policy cost more than just keeping objects in S3 Standard?

Because of minimum-storage-duration rules. Standard-IA charges thirty days minimum, Glacier Flexible ninety days, Glacier Deep Archive one hundred eighty days. If your policy transitions objects from Standard to Standard-IA at day one, then to Glacier Flexible at day two, then to Deep Archive at day three, you pay the full thirty-day Standard-IA charge plus the full ninety-day Glacier Flexible charge plus the full one-hundred-eighty-day Deep Archive charge — far more than Standard would have been. The fix is to leave each transition spaced by at least the destination class's minimum duration: transition to IA at day thirty, to Glacier Flexible at day ninety plus thirty, to Deep Archive at day one hundred eighty plus the prior accumulated time.

Q3 — When should I use One Zone-IA versus Standard-IA?

Use One Zone-IA only when the data can be regenerated cheaply from another source — transcoded media, analytics intermediates, cached datasets, derived computed features. Never use One Zone-IA for primary backups or any data that loses business value when an Availability Zone fails. The cost saving is twenty percent over Standard-IA, which is real but not worth the durability risk for irreplaceable data. The exam treats this as a single-question disqualifier — the trap pattern is "cheapest IA class for backups" with One Zone-IA listed as the cheapest. Choose Standard-IA instead.

Q4 — How do I configure S3 to trigger a Glue ETL job when a new file lands?

Configure an S3 Event Notification on s3:ObjectCreated:* for the landing prefix and route it to EventBridge. Create an EventBridge rule that matches the S3 event and targets a Step Functions state machine. The state machine starts a Glue job, waits for completion, and triggers downstream tasks. EventBridge is preferred over direct Lambda or SQS targets because it supports filtering, fan-out to multiple targets, and cross-account event routing. The legacy direct-Lambda pattern still works but does not scale to multiple downstream consumers and lacks the central rule visibility that EventBridge provides.

Q5 — What is the right lifecycle policy for a typical raw-data ingestion bucket?

Three rules: transition to S3 Standard-IA at thirty days, transition to S3 Glacier Flexible Retrieval or Deep Archive at ninety days (depending on retrieval-latency tolerance), and abort incomplete multipart uploads at seven days. Add a fourth expiration rule if compliance allows automatic deletion past a retention horizon (typically seven or ten years for most regulated industries). For unpredictable access patterns, replace the first transition with a transition to Intelligent-Tiering at zero days for objects 128 KB and larger, and let Intelligent-Tiering handle further movement.

Q6 — How do I prevent accidental deletion of compliance archives?

Three layers. First, enable versioning on the bucket so deletes create delete markers rather than removing the data. Second, configure MFA Delete via the root account CLI to require multi-factor authentication for permanent version deletion. Third, apply S3 Object Lock in Compliance mode with a retention period equal to the regulatory requirement — Compliance mode prevents anyone, including the root account, from deleting the object before the retention expires. Add a bucket policy denying s3:DeleteBucket and s3:DeleteObjectVersion to all principals except specific compliance officers. Together these layers make accidental and malicious deletion functionally impossible during the retention window.

Q7 — When should I use S3 Select versus Athena?

Use S3 Select when you need to filter rows from a single object — for example, an application reads a one-gigabyte CSV but only needs rows where region equals "us-west-2" — pushing the filter to S3 reduces network transfer and downstream compute. Use Athena when you need to query across many objects with SQL joins, aggregations, and partition pruning — Athena is the standard data lake query engine for multi-object SQL. S3 Select cannot join across objects and does not understand partitions. They solve different problems and are not substitutes for one another.

Further Reading — Official AWS Documentation For S3

The authoritative AWS sources are the S3 User Guide chapters on storage classes (overview and per-class detail), object lifecycle management (rule structure, transition considerations), Intelligent-Tiering (tier behavior, monitoring fee), versioning (configuration, MFA Delete), replication (CRR, SRR, RTC, prerequisites), event notifications (EventBridge, SQS, SNS, Lambda integration), Access Points (single-region and multi-region), Object Lock (Governance versus Compliance modes), and S3 Select (SQL filtering on objects).

The AWS Well-Architected Framework Storage Lens dashboard provides cost-and-usage analytics across all S3 storage classes for the data engineer who wants to validate that lifecycle policies are working as intended. The S3 pricing page documents exact per-class rates by region. The AWS Cost Explorer S3 storage class report breaks down cost by class for any time period — use it to identify the trap patterns described above (excessive monitoring fees, suboptimal transition timing, abandoned multipart uploads). Finally, the AWS Storage Blog has multiple deep-dive posts on lifecycle policy design for data lakes that complement the official documentation.

Official sources

More DEA-C01 topics