Data migration is the bridge between the old world and the new — moving on-premises file servers, relational databases, SaaS data, and mainframe records into the AWS data lake or analytics warehouse. AWS offers four canonical migration services that the DEA-C01 exam tests as a service-selection matrix: AppFlow for SaaS sources, DataSync for file systems, Database Migration Service (DMS) for relational databases, and Snowball Edge for petabyte-scale offline transfer. Plus Transfer Family for SFTP/FTPS endpoints and Schema Conversion Tool (SCT) for heterogeneous database migrations. Community study guides from Tutorials Dojo, Digital Cloud Training, and ExamCert.App all flag the same pain points — candidates pick DataSync for database migrations (wrong tool, files only), DMS for converting schema (DMS does not convert, SCT does), or AppFlow for arbitrary HTTP APIs (AppFlow is SaaS-connector based).
This guide is built for the data engineer perspective. It covers what each migration service does, when to pick each, the AppFlow flow configuration model, DataSync task configuration including bandwidth throttling and verification, DMS full-load and CDC mechanics with Schema Conversion Tool for heterogeneous migrations, Snowball Edge variants and the petabyte offline transfer pattern, Transfer Family for managed file transfer, the service selection matrix, and the canonical exam traps that catch most data engineers. By the end, the migration service surface should feel as straightforward as picking the right shipping carrier for a parcel.
What Is AWS Data Migration?
Data migration on AWS is the practice of moving data from a source — on-premises servers, partner SaaS, third-party databases, mainframes — into AWS managed storage or analytics services. The DEA-C01 exam treats migration as a Task 1.1 (perform data ingestion) topic because the boundary between "migration" and "ongoing ingestion" is blurred — DMS replicates continuously via CDC, AppFlow runs on schedule, DataSync syncs on schedule. The key insight: pick the service whose source/target/protocol/scale matches your specific use case, not a one-size-fits-all answer.
The Five Source Categories
AWS migration services target five distinct source categories: SaaS applications (Salesforce, ServiceNow, Slack, Marketo, Google Analytics) → AppFlow. File systems (NFS, SMB, HDFS, Azure Blob, Google Cloud Storage) → DataSync. Relational databases (Oracle, SQL Server, PostgreSQL, MySQL, MariaDB, MongoDB) → DMS. Petabyte offline data (data centers with poor network bandwidth, disaster recovery) → Snowball Edge or Snowmobile. Managed file transfer protocols (SFTP, FTPS, FTP, AS2 from external partners) → Transfer Family. The exam plants service-selection scenarios that hinge on matching source category to service.
Online vs Offline Migration
Online migration uses network bandwidth — DataSync over public internet or VPN, DMS over private connections, AppFlow over HTTPS. Offline migration uses physical devices — Snowball Edge ships hard drives via FedEx, Snowmobile is a literal truck for exabyte-scale transfers. The decision hinges on data volume divided by available bandwidth: at 100 Mbps, transferring 100 TB takes about 90 days (vs a Snowball Edge round-trip in about 1 week). Calculate before designing.
Plain-Language Explanation: AWS Data Migration
Migration service selection is the kind of system where naming alone does not convey the trade-offs. Three concrete analogies make the structure stick.
Analogy 1 — The Moving Company With Specialty Crews
Picture a moving company. AppFlow is the office relocation crew specializing in commercial setups — they know the SaaS office tenants (Salesforce, ServiceNow, SAP) by name, have pre-built furniture-disassembly templates for each tenant, and can move office contents to your new building with no custom planning. DataSync is the household movers with hand trucks for file boxes — they handle anything packed in standard boxes (NFS, SMB, HDFS file systems), schedule pickup at your old place, deliver to S3 or EFS, and check that every box arrived intact (data integrity verification). DMS is the specialty crew for delicate antiques (relational databases) — they unpack each piece (full load), then keep moving the pieces still in your old home as you live there (CDC), until you are ready to switch addresses. Snowball Edge is the shipping container for when you have so much stuff that household movers would take months — load up the container at your old home, ship it via FedEx, AWS unloads at the data center.
The catch: the office relocation crew (AppFlow) cannot move antique china (databases) — they only know offices. The household movers (DataSync) cannot move a database that is still being written to — only files at rest. The antique crew (DMS) does not pack file boxes — only relational items. Picking the wrong crew is the canonical exam trap. The Schema Conversion Tool (SCT) is the consultant who comes before the antique crew to translate your antique china (Oracle PL/SQL) into the new home's display style (PostgreSQL syntax) — DMS moves the data, SCT translates the structure.
Analogy 2 — The Library Acquisitions Department With Different Sources
Picture a research library acquiring new collections from different donors. AppFlow is the electronic journal subscription processor — handles digital subscriptions to specific publishers (the SaaS connectors), authenticates with each publisher's portal, and downloads new issues on schedule. DataSync is the physical book delivery handler — receives boxes from estates and other libraries, verifies counts and conditions, and shelves them. DMS is the catalog migration specialist — when the library switches from one cataloging system to another, this team transfers card-catalog records one by one (full load) and keeps adding any new cards filed during the transfer (CDC). Snowball Edge is the estate of a deceased professor with a 10,000-book private library — too many books for the regular delivery handler to process; the library sends a U-Haul (Snowball) to the estate, fills it, drives back, and unloads.
The library has rules: the journal processor (AppFlow) cannot process a U-Haul of books, the book delivery handler (DataSync) cannot handle live catalog records being updated in real time, and the catalog specialist (DMS) does not deal with physical books. Transfer Family is the interlibrary loan endpoint — exposes a standard protocol (SFTP) so external libraries can drop off shipments without knowing AWS; underneath, it lands them in S3.
Analogy 3 — The International Shipping Company With Multiple Modes
Picture an international shipping company. AppFlow is the integration partnership desk — pre-negotiated agreements with major carriers (Salesforce-Amazon, ServiceNow-AWS) so customer parcels move automatically when ordered. DataSync is the air cargo for general freight — fast network transfer of files, schedules pickup and delivery, throttles bandwidth so it does not saturate the customer's circuit. DMS is the diplomatic courier service for sensitive documents (databases) — handles customs (schema mapping), maintains chain of custody (transactional consistency), and supports both one-time delivery (full load) and ongoing courier service (CDC replication). Snowball Edge is the container ship for bulk cargo — when air cargo would take months, fill a container, ship by sea, unload at destination.
The shipping company has a calculator: divide volume by bandwidth, compare to Snowball round-trip time, pick the cheaper option. Schema Conversion Tool (SCT) is the customs translator who converts between document standards (Oracle PL/SQL → PostgreSQL syntax) so the diplomatic courier can deliver to a country with different paperwork requirements. Transfer Family is the public drop box — anyone with the address can drop a parcel via standard protocol; the shipping company picks it up and routes it internally.
AWS AppFlow — SaaS Data Ingestion
AppFlow is the no-code SaaS-to-AWS integration service.
What AppFlow Does
AppFlow connects supported SaaS applications to AWS storage and analytics services without custom integration code. Salesforce records flow to S3, ServiceNow incidents flow to Redshift, SAP order data flows to S3 — all configured through the AppFlow console as flows with source, target, mapping, filter, and trigger.
Supported SaaS Sources And Targets
AppFlow supports 50+ SaaS sources including Salesforce, ServiceNow, SAP OData, Slack, Marketo, Zendesk, Google Analytics, Snowflake (as both source and target), Datadog, and more. Targets include S3, Redshift, Snowflake, EventBridge, Salesforce (for write-back), and Honeycode. The set is curated — you cannot add an arbitrary HTTP API as an AppFlow source. For arbitrary APIs, Lambda or Glue with custom code is the right answer.
Flow Configuration — Trigger, Mapping, Filter
A flow's trigger is on-demand (run once via API), scheduled (cron-based recurring), or event-based (driven by source events for sources that support push). Mapping defines source-field-to-target-field correspondence including type conversion, concatenation, and basic transformations like string truncation. Filters drop records before they reach the target — for example, only Salesforce opportunities in the "Closed Won" stage.
AppFlow PrivateLink Support
For sources and targets that support it, AppFlow uses AWS PrivateLink to keep traffic on the AWS backbone — never traverses the public internet. This is important for compliance use cases where data leaving AWS network is a concern. Salesforce, Slack, Snowflake, and ServiceNow all support PrivateLink with AppFlow.
When To Use AppFlow
The exam plants AppFlow as the answer when the scenario mentions a specific named SaaS source — Salesforce, ServiceNow, SAP, Slack, Marketo, Snowflake. Anything else (custom REST API, internal HTTP service, arbitrary database) is not AppFlow — it is Lambda, Glue, or another tool. AppFlow's value is the pre-built connector with authentication, throttling, and field mapping handled.
AWS AppFlow is a no-code, fully managed integration service for moving data between specific named SaaS applications (Salesforce, ServiceNow, SAP, Slack, Marketo, Snowflake, and 50+ others) and AWS storage and analytics services. Flows configure source connector, target, field mapping, filters, and triggers (on-demand, scheduled, event-based) without any code. AppFlow does not support arbitrary HTTP APIs — it requires a pre-built SaaS connector. PrivateLink integration keeps traffic on the AWS backbone for supported connectors. The service is the right answer when a DEA-C01 scenario names a SaaS source from the supported list; it is the wrong answer for custom REST APIs, internal application integrations, or relational database migrations.
AWS DataSync — File System Migration
DataSync is the managed file-transfer service for migrating to S3, EFS, FSx, and back.
Supported Sources And Targets
DataSync sources include on-premises NFS, SMB, HDFS, self-managed object storage (S3-compatible), Azure Blob Storage, Google Cloud Storage, and AWS file services. Targets include S3 (any storage class), EFS, FSx for Windows, FSx for Lustre, FSx for OpenZFS, and FSx for NetApp ONTAP. Cross-region and cross-account transfers are supported.
DataSync Agent And Tasks
For on-premises sources, you deploy a DataSync agent as a VM (VMware, KVM, Hyper-V, or AWS Outposts) on your network — the agent reads from the source storage and pushes to AWS over the public internet, VPN, or Direct Connect. Tasks define source location, destination location, schedule, bandwidth limit, and filter rules. Each task transfers data and reports detailed metrics including files transferred, bytes transferred, and verification status.
Verification And Integrity
DataSync verifies file integrity at multiple points — checksum at the source, checksum after transfer, comparison to ensure no silent corruption. The verification mode is configurable: POINT_IN_TIME_CONSISTENT (default), ONLY_FILES_TRANSFERRED (faster, only verifies new files), or NONE (fastest, no verification). For most migrations, the default is correct.
Bandwidth Throttling And Scheduling
Tasks can be throttled to a maximum bandwidth (in MiB/s) to avoid saturating the on-premises network during business hours. Scheduling lets you run tasks on cron expressions — for example, full sync at midnight, hourly incremental between business hours.
When To Use DataSync
DataSync is the answer when the source is a file system or object storage and you need an online managed transfer with verification, scheduling, and bandwidth control. It is the wrong answer for relational database migration (use DMS), for SaaS data ingestion (use AppFlow), or for petabyte-scale offline transfer (use Snowball Edge).
AWS Database Migration Service (DMS)
DMS is the canonical service for migrating relational and certain NoSQL databases to AWS.
What DMS Does
DMS migrates data from a source database to a target database with minimal downtime. Two operational modes: full load copies the source data once, and change data capture (CDC) continuously replicates ongoing changes from source to target. The combined mode (full load + ongoing CDC) lets you migrate while the source is still in use, then cut over with seconds of downtime.
Supported Sources And Targets
Sources include Oracle, SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, Aurora, RDS for any of those, IBM Db2, SAP ASE, Azure SQL, and more. Targets include all those plus Redshift, S3 (as Parquet/CSV), DynamoDB, OpenSearch, Kinesis Data Streams, Kafka, Neptune, and DocumentDB. The matrix is large and tested — homogeneous (Oracle to Oracle) and heterogeneous (Oracle to PostgreSQL) migrations are both supported.
DMS Replication Instance
A replication instance is the EC2-backed compute that runs the DMS engine. Sized like RDS instances (T3, R5, C5 families, various sizes). Multi-AZ deployment provides high availability for production migrations. The instance acts as a stateful pipeline between source and target — it reads from source, transforms if needed, and writes to target.
DMS Tasks And Endpoints
A task describes one migration: source endpoint, target endpoint, migration type (full load, CDC, or full load + CDC), table mappings (which tables to migrate, with optional filters and column transformations), and replication settings. Endpoints describe how to connect to a source or target including credentials (referenced from Secrets Manager) and connection attributes.
Schema Conversion Tool (SCT) For Heterogeneous Migrations
DMS itself does not convert schema between heterogeneous engines — Oracle PL/SQL stored procedures do not run on PostgreSQL without translation. The AWS Schema Conversion Tool is a separate desktop or container application that analyzes the source schema, generates equivalent target schema and stored procedures, flags items requiring manual review, and applies the converted schema to the target. SCT runs before DMS — convert schema first, then run DMS to migrate data.
DMS CDC Mechanics
CDC reads the source database's transaction log (Oracle redo log, SQL Server transaction log, PostgreSQL WAL, MySQL binlog, MongoDB oplog) and applies the same operations to the target. Latency is typically seconds. The source database must be configured for log retention long enough for CDC to catch up after any pause. CDC can run indefinitely as a continuous replication or terminate after the source is decommissioned.
DMS To S3 As Parquet
A common DEA-C01 pattern: DMS to S3 with Parquet format and cdc_path configuration writes ongoing CDC events as Parquet files in S3, partitioned by date. This is the canonical pattern for replicating a relational source into a data lake — DMS handles the source-side log reading, S3 stores the changes, Glue catalog and Athena query the result. Combined with S3 + Glue + Athena, DMS-to-S3 builds a full operational data lake fed from production databases.
DMS migrates data but does not convert schema between heterogeneous engines — use AWS Schema Conversion Tool (SCT) to translate Oracle PL/SQL to PostgreSQL syntax, SQL Server T-SQL to PostgreSQL functions, or any other cross-engine translation, then run DMS to move the data. This separation is the most common DEA-C01 trap in the migration service area. SCT analyzes the source schema, generates equivalent target schema and stored procedures, flags items requiring manual review (some Oracle features have no PostgreSQL equivalent), and produces an assessment report quantifying conversion completeness. DMS picks up after SCT — full load copies the data into the converted target schema, and CDC keeps the target in sync until cutover. Skipping SCT and trying to use DMS alone for heterogeneous migrations leaves the target with no usable schema and no stored procedures.
AWS Snowball Edge — Petabyte Offline Transfer
Snowball Edge is the physical device for offline data transfer.
When Snowball Beats Network
Calculate transfer time as data volume divided by sustained network bandwidth. At 100 Mbps sustained, 100 TB takes about 90 days; a Snowball Edge round trip is about 1 week. The break-even is roughly 10 TB per 100 Mbps; below that, online transfer wins; above that, Snowball wins. Snowmobile (a 45-foot truck holding 100 PB) is for exabyte-scale data center evacuations.
Snowball Edge Variants
Two main variants. Snowball Edge Storage Optimized (210 TB usable) is for pure storage transfer — lots of capacity, modest compute. Snowball Edge Compute Optimized (also has storage but with stronger compute, GPU options) is for edge use cases where you want to run Lambda or EC2 on-device for processing before transfer. For pure migration, Storage Optimized is the typical choice.
The Snowball Workflow
Order via console; AWS ships the device. Receive, plug into power and network, unlock with keys from the console, and use the Snowball client or NFS endpoint to copy data onto the device. When done, ship back via the prepaid shipping label. AWS receives, verifies, and copies to the target S3 bucket. Total round trip is typically 1-2 weeks for one device, parallelizable by ordering multiple devices.
Encryption And Compliance
Data on Snowball Edge is encrypted at rest with AES-256. The encryption key is held by KMS in your account; the device itself never has the key. After AWS uploads to S3, the device is wiped to NIST standards. The chain of custody is auditable via CloudTrail.
When To Use Snowball
The exam plants Snowball as the answer when the scenario mentions petabyte-scale, limited bandwidth, on-premises data center evacuation, or "weeks of transfer time at current bandwidth." For smaller transfers (a few TB), DataSync over network is faster and cheaper.
AWS Transfer Family — Managed File Transfer Endpoints
Transfer Family exposes managed SFTP, FTPS, FTP, and AS2 endpoints landing data directly into S3 or EFS.
Supported Protocols
SFTP (SSH File Transfer Protocol, the default) for partners using SSH-based file transfer. FTPS (FTP over TLS) for legacy FTP clients with TLS encryption. FTP (plaintext) for internal-network-only use. AS2 (Applicability Statement 2) for B2B EDI document exchange.
Identity Providers
Transfer Family supports identity through Service-managed users (managed in AWS), AWS Directory Service, custom identity providers via Lambda or API Gateway, or IAM roles directly. The identity provider determines who can authenticate and what S3 bucket or EFS path they have access to.
Use Case For Data Engineering
The canonical pattern: an external partner (a vendor, a customer, a regulator) sends daily files via SFTP. Transfer Family exposes an SFTP endpoint that lands those files directly in S3, where downstream Lambda or Glue picks them up. Without Transfer Family, you would run an EC2 SFTP server with all the operational cost; Transfer Family is a managed alternative.
When To Use Transfer Family
The exam plants Transfer Family as the answer when the scenario mentions external partners using standard file-transfer protocols. For internal AWS-to-AWS or AWS-to-on-premises, DataSync is usually a better fit. For S3 access by AWS-native clients, the S3 API directly is appropriate.
Migration Service Selection Matrix
The DEA-C01 exam regularly tests service selection.
Match Source To Service
SaaS named source (Salesforce, ServiceNow, SAP, Slack, Marketo, Snowflake) → AppFlow. File system or object storage (NFS, SMB, HDFS, Azure Blob, Google Cloud Storage) → DataSync. Relational or NoSQL database (Oracle, SQL Server, MySQL, MongoDB, etc.) → DMS. Petabyte-scale, limited bandwidth offline transfer → Snowball Edge or Snowmobile. External partner with SFTP/FTPS/FTP/AS2 protocol → Transfer Family.
Schema Conversion vs Data Migration
DMS does data migration — moves rows from source to target. Schema Conversion Tool (SCT) does schema conversion — translates DDL and stored procedures between heterogeneous engines. They run in sequence: SCT first, then DMS.
Online vs Offline
Online — DataSync, DMS, AppFlow, Transfer Family — uses network bandwidth, suitable when bandwidth divided by data volume gives a reasonable timeline. Offline — Snowball Edge, Snowmobile — uses physical devices, suitable for petabyte scale with limited bandwidth.
Calculate online transfer time as data volume divided by sustained bandwidth, then compare to a Snowball Edge round-trip of about 1 week — break-even is roughly 10 TB per 100 Mbps of sustained bandwidth. Below 10 TB at 100 Mbps, online via DataSync wins; above that, Snowball Edge wins. For a 1 PB data center evacuation at 100 Mbps, online would take about 2.5 years; ten Snowball Edge Storage Optimized devices in parallel finish in about 2 weeks. The exam plants this calculation as "we need to transfer 500 TB in 2 weeks with 1 Gbps bandwidth" — at 1 Gbps you can transfer about 100 TB per week, so 500 TB takes about 5 weeks online vs about 2 weeks via parallel Snowball Edge devices. Always do the math; do not assume online or offline by feel.
Common Exam Traps For Migration Services
The DEA-C01 exam plants a consistent set of traps. Memorize all five.
Trap 1 — DMS Converts Schema
The most cited trap. DMS migrates data only; it does not convert schema between heterogeneous engines. Use SCT for schema conversion, then DMS for data migration.
Trap 2 — DataSync For Database Migration
A scenario describes migrating an Oracle database to RDS. Wrong answer: DataSync. DataSync handles file systems and object stores, not relational databases. Use DMS.
Trap 3 — AppFlow For Arbitrary HTTP API
A scenario describes ingesting from a custom internal REST API. Wrong answer: AppFlow. AppFlow requires a pre-built SaaS connector — it does not support arbitrary HTTP. Use Lambda or Glue with custom code.
Trap 4 — Snowball For Small Transfers
A scenario describes migrating 5 TB of file data. Wrong answer: Snowball Edge. At reasonable bandwidth (100 Mbps+), 5 TB transfers online in days, faster than a Snowball round trip. Use DataSync.
Trap 5 — DMS Without CDC
A scenario describes migrating a production database that is still being written to during migration. Wrong answer: DMS full-load only. The CDC mode is needed to capture writes during the full load and after, enabling a near-zero-downtime cutover.
Trap 6 — DataSync Without Bandwidth Limit
A scenario mentions concerns about saturating the on-premises network during business hours. Wrong answer: DataSync default config. Configure bandwidth throttling on the task and schedule it for off-hours.
Trap 7 — Transfer Family vs DataSync Confusion
A scenario describes external partners pushing daily files via SFTP. Wrong answer: DataSync (it is for AWS-managed pull-based transfers). Right answer: Transfer Family, which exposes an SFTP endpoint where partners push files into S3.
A scenario asking how to migrate Oracle to PostgreSQL using DMS alone is a leakage trap — DMS does not translate Oracle PL/SQL stored procedures, custom data types, or proprietary syntax to PostgreSQL equivalents. Without Schema Conversion Tool first, the target PostgreSQL has no stored procedures, no functions, no triggers — just empty tables waiting for DMS to fill them. Even the table DDL may have type mismatches (Oracle NUMBER to PostgreSQL NUMERIC, Oracle VARCHAR2 to PostgreSQL VARCHAR) that DMS cannot resolve cleanly. The correct sequence is: run SCT to analyze the Oracle schema, review the conversion report for unsupported items, apply the converted schema to PostgreSQL, then configure DMS to migrate data into the now-ready target. The DEA-C01 exam plants this as a multi-step answer where SCT precedes DMS — picking DMS-only for heterogeneous migration is wrong.
Common Exam Traps Round 2 — More Selection Pitfalls
A few more traps worth memorizing.
Trap 8 — DMS To S3 Always Means Parquet
DMS to S3 supports CSV, Parquet, and JSON. Default may be CSV depending on configuration. Specify Parquet explicitly for analytics-ready output.
Trap 9 — AppFlow Only Pulls
AppFlow is bidirectional for some connectors (Salesforce as both source and target) but most flows are SaaS-to-AWS pulls. Pushing AWS data back to a SaaS for write-back is supported only for connectors that allow it.
Trap 10 — Snowball Compute Optimized For Data Migration
Snowball Edge Compute Optimized has GPUs and is designed for edge processing. For pure data migration (the typical DEA-C01 use case), Storage Optimized is correct. Compute Optimized is for "process at the edge before shipping" workloads.
Migration service matrix: AppFlow for SaaS, DataSync for files, DMS for databases (with SCT for heterogeneous schema conversion), Snowball Edge for petabyte offline, Transfer Family for SFTP/FTPS partner endpoints. This is the one sentence to memorize for every DEA-C01 migration question. If the scenario word is "Salesforce" or "ServiceNow" or "SAP," answer AppFlow. If "NFS" or "SMB" or "file system" or "HDFS," answer DataSync. If "Oracle" or "SQL Server" or "MySQL" or "PostgreSQL" or "database migration," answer DMS — and add SCT if heterogeneous. If "petabyte" or "limited bandwidth" or "ship physical devices," answer Snowball Edge. If "external partner" or "SFTP" or "FTPS," answer Transfer Family. Stop trying to make one service do everything; AWS designed each for a specific source category.
Key Numbers And Must-Memorize Migration Facts
AppFlow
- 50+ pre-built SaaS connectors (Salesforce, ServiceNow, SAP, Slack, Marketo, Snowflake, etc.)
- Triggers: on-demand, scheduled, event-based
- Targets: S3, Redshift, Snowflake, EventBridge, supported SaaS for write-back
- PrivateLink supported for select connectors
DataSync
- Sources: NFS, SMB, HDFS, S3-compatible, Azure Blob, Google Cloud Storage
- Targets: S3 (any class), EFS, FSx (all variants)
- Verification modes: POINT_IN_TIME_CONSISTENT (default), ONLY_FILES_TRANSFERRED, NONE
- Agent deployed as VM (VMware, KVM, Hyper-V, AWS Outposts)
DMS
- Sources: Oracle, SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, Db2, SAP ASE, more
- Targets: same as sources plus Redshift, S3, DynamoDB, OpenSearch, Kinesis, Kafka, Neptune, DocumentDB
- Modes: full load, CDC, full load + CDC
- Multi-AZ replication instance for HA
Schema Conversion Tool (SCT)
- Analyzes source schema, generates target schema and stored procedures
- Flags items requiring manual review
- Runs before DMS, not concurrently
- Desktop or container deployment
Snowball Edge
- Storage Optimized: 210 TB usable, basic compute
- Compute Optimized: smaller storage, stronger compute, GPU options
- Round trip: about 1 week
- Encryption: AES-256, key in customer KMS
Transfer Family
- Protocols: SFTP, FTPS, FTP, AS2
- Identity: service-managed, Directory Service, custom Lambda/API Gateway, IAM
- Targets: S3, EFS
Break-Even Calculation
- Online (DataSync) vs offline (Snowball): about 10 TB per 100 Mbps sustained
- Below break-even use online; above use offline
DEA-C01 exam priority — AppFlow, DataSync, DMS, and Snowball for Data Migration. This topic carries weight on the DEA-C01 exam. Master the trade-offs, decision boundaries, and the cost/performance triggers each AWS service exposes — the exam will test scenarios that hinge on knowing which service is the wrong answer, not just which is right.
FAQ — AWS Data Migration Top Questions
Q1 — When should I use AppFlow vs Lambda for SaaS data ingestion?
Use AppFlow when the source is on the AppFlow connector list (Salesforce, ServiceNow, SAP, Slack, Marketo, Snowflake, and 50+ others) and you want a no-code, scheduled, fully managed pull. AppFlow handles authentication, throttling, retries, field mapping, filters, and PrivateLink integration where supported. Use Lambda or Glue with custom code when the source is a custom REST API, an internal HTTP service, a SaaS not on the AppFlow connector list, or when you need transformation logic beyond what AppFlow's mapping supports. The exam plants this as "ingest data from Salesforce on a daily schedule" — AppFlow scheduled flow. "Ingest from our internal pricing API every 5 minutes" — Lambda on EventBridge Scheduler trigger, not AppFlow. The decision is connector availability: if AppFlow has a pre-built connector, use it; otherwise, build with Lambda or Glue.
Q2 — What is the difference between DataSync and DMS?
DataSync moves files and objects — NFS, SMB, HDFS, S3-compatible, Azure Blob, Google Cloud Storage to S3, EFS, or FSx. It is for unstructured data, not for live relational databases. DMS migrates relational and certain NoSQL databases — Oracle, SQL Server, MySQL, PostgreSQL, MongoDB to AWS-native or AWS-hosted equivalents, plus targets like S3 (as Parquet), Redshift, Kinesis, and Kafka. It supports full-load and CDC for near-zero-downtime migrations. They never overlap: a database is never migrated with DataSync, and a file system is never migrated with DMS. The exam plants confusion scenarios where the source is described ambiguously (a "data store") to test whether you read carefully — file system → DataSync, database → DMS.
Q3 — How do I migrate a heterogeneous database (Oracle to PostgreSQL) on AWS?
Two-step process. Step 1: Schema Conversion Tool (SCT) analyzes the Oracle source, generates the equivalent PostgreSQL schema including tables, indexes, foreign keys, and stored procedures, flags items that require manual review (some Oracle features have no direct PostgreSQL equivalent), and produces an assessment report quantifying conversion completeness. Apply the converted schema to the target PostgreSQL database. Manually rewrite items SCT flagged. Step 2: DMS migrates the data into the now-ready PostgreSQL target. Configure a DMS task with the Oracle source endpoint, the PostgreSQL target endpoint, and migration type "full load + CDC" so the migration captures any writes during the full load and continues replicating until cutover. After validation, redirect application traffic to PostgreSQL and stop the DMS task. Skipping SCT and trying to use DMS alone for heterogeneous migration is the most common DEA-C01 trap — DMS migrates data, not schema.
Q4 — When should I use Snowball Edge vs DataSync?
Calculate the online transfer time first: data volume divided by sustained bandwidth equals time. Compare to a Snowball Edge round trip of about 1 week. Use DataSync when online transfer time is reasonable (days, not months) — typically below 10 TB per 100 Mbps of bandwidth, or proportionally more at higher bandwidth. Use Snowball Edge when online transfer time is unacceptable — petabyte-scale, severely limited bandwidth, data center evacuation timelines. Use Snowmobile for exabyte-scale (rare; mostly enterprise data center decommissions). Multiple Snowball Edge devices in parallel scale capacity linearly — ten devices ship 2 PB in the same wall-clock time as one device shipping 200 TB. The exam plants this as a calculation question: do the math, compare to the round-trip time, pick the cheaper or faster option as the question requires. Do not assume by feel; always compute.
Q5 — How does DMS CDC handle source database writes during migration?
DMS CDC reads the source database's transaction log (Oracle redo log, SQL Server transaction log, PostgreSQL WAL, MySQL binlog, MongoDB oplog) in real time and applies the same operations to the target. The source database must be configured for log retention long enough to cover the migration window — for example, retaining MySQL binlogs for 24-48 hours so DMS can catch up if it pauses. The combined "full load + CDC" mode runs the full load first while CDC is buffering ongoing changes; after the full load completes, CDC applies the buffered changes and then continues in real time. Latency from source to target is typically seconds when CDC is caught up. The cutover process: pause application writes to the source for a brief moment (seconds), wait for CDC to drain remaining buffered changes, redirect application traffic to the target, and stop the DMS task. This achieves near-zero downtime migration. The exam plants "migrate while production is live" as the canonical full-load-plus-CDC scenario.
Q6 — How do I handle external partners pushing files to my AWS data lake?
Use AWS Transfer Family to expose a managed SFTP, FTPS, FTP, or AS2 endpoint that authenticates the partner and lands their files directly in S3 (or EFS). Configure identity through service-managed users for small partner counts, AWS Directory Service for enterprise integrations, or a custom Lambda or API Gateway identity provider for complex authentication flows. Each partner identity maps to a specific S3 bucket and prefix, so partner A cannot see partner B's files. Once files land in S3, downstream processing follows the standard event-driven pattern: S3 event notification triggers a Lambda or Glue job that validates, transforms, and registers the data. The benefit over running an EC2 SFTP server is significant — Transfer Family is fully managed, scales automatically, integrates with CloudWatch for audit, and bills per protocol-hour plus data transferred. The exam plants this as "external partners use SFTP" with Transfer Family as the answer, not EC2 with OpenSSH or DataSync (DataSync is pull-based; Transfer Family is push-based for partners).
Q7 — Can DMS replicate to a data lake on S3?
Yes — DMS to S3 is one of the canonical patterns for building a relational-database-fed data lake. Configure the DMS task with the relational source endpoint and an S3 target endpoint, choose Parquet as the target format (default may be CSV; specify Parquet for analytics-ready output), and configure cdc_path so CDC events are written as ongoing files in S3 partitioned by date. Use migration mode "full load + CDC" so the initial state of the source is dumped to S3 as Parquet and ongoing changes are appended as new Parquet files. Combine with Glue Data Catalog (run a crawler or use partition projection) and Athena to query the resulting data lake. This pattern is heavily tested on DEA-C01 because it is a real production architecture: production OLTP database stays in Oracle or PostgreSQL, analytics workloads query the S3-replicated copy via Athena or Redshift Spectrum, and there is no impact on the OLTP performance from analytics queries. The exam plants this as "replicate production database to a data lake without affecting OLTP performance" — DMS to S3 as Parquet is the answer.
Further Reading — Official AWS Documentation For Migration Services
The authoritative AWS sources are: the AWS AppFlow user guide covering connectors, flows, mapping, and PrivateLink integration; the AWS DataSync user guide covering agents, tasks, locations, and verification; the AWS Database Migration Service user guide covering replication instances, endpoints, tasks, and CDC; the AWS Schema Conversion Tool user guide covering analysis, conversion, and assessment reporting; the AWS Snowball Edge developer guide covering device variants, ordering, and the workflow; and the AWS Transfer Family user guide covering protocol support, identity providers, and S3/EFS targets.
The AWS Migration whitepaper provides the architectural framework for picking between online and offline transfer, calculating bandwidth budgets, and sequencing schema conversion before data migration. The AWS Database Blog has multiple deep-dive posts on DMS performance tuning, heterogeneous migration patterns, and S3-as-target architectures. The AWS Big Data Blog covers DataSync to S3 patterns, AppFlow for analytics ingestion, and Snowball Edge case studies. Finally, the AWS Solutions library includes pre-built reference architectures for common migration scenarios — Oracle-to-Aurora-PostgreSQL, on-premises Hadoop to S3, and SaaS to data lake pipelines — that codify the DEA-C01 patterns into deployable templates.