examlab .net The most efficient path to the most valuable certifications.
In this note ≈ 20 min

Pipeline Orchestration — Step Functions, MWAA, and EventBridge

4,000 words · ≈ 20 min read ·

DEA-C01 Tasks 1.3/3.1 pipeline orchestration: Step Functions Standard vs Express workflows, MWAA Apache Airflow DAGs with continuous cost, EventBridge Scheduler cron-like triggers, Map state fan-out, error handling Retry/Catch, and the MWAA-vs-Step-Functions cost trap.

Do 20 practice questions → Free · No signup · DEA-C01

Pipeline orchestration is the backbone of every production data engineering platform, and on the DEA-C01 exam choosing the right orchestrator — Step Functions, MWAA, or EventBridge Scheduler — is one of the most cited Domain 1 and Domain 3 traps in community write-ups. The trap is rarely about whether to orchestrate (the alternative is hand-rolled cron jobs that break at scale) — it is about cost shape and use-case fit. A team that picks MWAA for a simple linear five-step pipeline pays $300+ per month for a continuously-running Airflow environment when Step Functions would cost dollars; a team that picks Step Functions for a complex 50-DAG analytics platform with cross-team Airflow expertise leaves operational simplicity on the table.

This guide walks orchestration on AWS through the Data Engineer / MLOps lens — when to use Step Functions Standard versus Express workflows, how MWAA's continuous-cost model differs from Step Functions' per-transition billing, how EventBridge Scheduler simplifies cron-like triggers, the Map state pattern for parallel processing, error handling with Retry and Catch, integration patterns connecting orchestrators to Glue/EMR/Redshift, and the canonical exam traps planted around cost comparison, MWAA versus Step Functions selection, and the orchestration-versus-event-routing distinction.

Why Orchestration Matters — The Hand-Rolled Cron Anti-Pattern

Before talking about specific orchestrators, understand the problem they solve.

What Manual Cron Cannot Do

A cron job that runs glue start-job-run at 02:00 daily looks fine until the Glue job fails — cron has no retry logic, no downstream notification, no awareness that the next step (a Redshift COPY) should not run because the upstream failed. Manual cron also cannot express conditional branching ("if record count is below threshold, send alert; otherwise proceed"), parallel fan-out, or cross-service coordination.

What Orchestrators Provide

State management (track which step ran, which failed), retry policies (per-step exponential backoff with jitter), error handling (catch specific error codes, route to compensation logic), conditional branching, parallel execution, human-in-the-loop steps, observability (workflow visualizations, per-step logs), and version control (workflow definitions stored as code).

Three AWS Options

Step Functions for managed state machines, MWAA for managed Apache Airflow, and EventBridge for event routing and scheduling. The DEA-C01 exam tests which is right for which workload.

AWS Step Functions — State Machines For Pipelines

Step Functions is the AWS-native orchestrator for serverless workflows.

State Machine Model

A Step Functions workflow is a state machine defined in Amazon States Language (ASL), a JSON dialect that lists states (steps) and transitions. Each state has a type — Task (do work), Choice (branch), Wait (pause), Parallel (run states concurrently), Map (iterate over array), Pass (transform data), Fail (terminate with error), Succeed (terminate successfully).

Standard Vs Express Workflows

Step Functions has two pricing and runtime models:

Standard workflows: at-most-once execution with exactly-once delivery semantics, durable state for up to one year, full execution history visible in the console, $25 per million state transitions, billed per transition. Right for: long-running pipelines (minutes to days), workflows that need observable history, multi-step ETL with audit requirements.

Express workflows: at-least-once execution semantics, max 5-minute duration per execution, limited history (CloudWatch Logs only), per-execution + per-GB-second billing similar to Lambda, far cheaper at high transaction volume. Right for: high-volume short-duration workflows (IoT event processing, API request orchestration), micro-batches under 5 minutes.

The DEA-C01 trap: candidates default to Standard for everything; the right answer for high-volume short-duration processing is Express.

Direct Service Integrations

Step Functions has 200+ direct service integrations — call Glue, EMR, Athena, Redshift Data API, Lambda, SNS, SQS, DynamoDB, EventBridge, and many more without writing Lambda glue code. The integration is declared in the ASL state definition; AWS handles the API call, response parsing, and error propagation.

Map State For Fan-Out

The Map state runs the same set of steps in parallel across each item in an input array. Use case: process 100 S3 files in parallel by passing the array of S3 keys as input and letting Map invoke a Glue job per key. The Map state has a MaxConcurrency parameter (default 0 = unlimited, set explicitly for production).

Distributed Map For Massive Parallelism

Distributed Map is the high-scale variant supporting up to 10,000 concurrent child executions and reading input from S3 (CSV, JSON, manifest file). Use case: process millions of records in parallel without hitting the standard Map's concurrency limits.

Error Handling — Retry And Catch

Each Task state can declare Retry rules (which error codes retry, with what backoff and max attempts) and Catch rules (which error codes route to a fallback state). Retry handles transient failures; Catch handles permanent failures by routing to compensation, alerting, or graceful degradation. The DEA-C01 exam plants this with scenarios about resilient pipelines that recover from API throttling or Glue job failures.

Activity And Callback Patterns

Step Functions can pause and wait for an external system to complete a task. Two patterns: Activity workers poll for tasks and report results; Callback with task token lets any AWS service or external API send a token back when work is done. Use case: human approval steps in data pipelines, integrating with non-AWS systems, long-running EMR Steps API calls.

Step Functions Standard workflows are for long-running pipelines billed per transition; Express workflows are for high-volume short-duration workflows billed per execution time and memory — choose Standard for ETL pipelines with multi-minute or multi-hour durations, Express for high-throughput micro-batches under 5 minutes. Standard costs $25 per million state transitions, has full execution history visible for 90 days, and supports up to 1-year duration. Express costs are roughly 100x cheaper at high TPS but executions die at 5 minutes and history goes to CloudWatch Logs only. The DEA-C01 exam plants this as a cost-comparison scenario: "the team runs 10 million short workflows per month for IoT event processing" => Express; "the team runs a daily 4-hour multi-step ETL pipeline" => Standard. Picking Express for the 4-hour pipeline is wrong because it exceeds the 5-minute limit; picking Standard for 10M short executions is wrong because the per-transition cost adds up to thousands of dollars when Express would be tens of dollars.

Amazon MWAA — Managed Apache Airflow

MWAA is the AWS managed service for Apache Airflow, the open-source orchestrator popular in data engineering teams.

What MWAA Provides

A managed Airflow environment with the scheduler, webserver, and workers running on AWS-managed infrastructure. You upload DAG files (Python code defining workflows) to an S3 bucket, MWAA picks them up and schedules them. The environment handles Airflow upgrades, security patches, and worker auto-scaling.

Why Teams Choose Airflow

Airflow has a large ecosystem of operators (database connections, S3 sensors, Spark submission, custom Python tasks), a mature community, and broad portability — DAGs can run on managed Airflow, self-hosted Airflow, Cloud Composer (GCP), or Astronomer (third-party). Teams with existing Airflow expertise or DAGs choose MWAA for migration.

MWAA Pricing — The Continuous-Cost Model

MWAA bills per-environment-hour for the scheduler/webserver baseline plus per-worker-hour for the worker pool. A small environment (mw1.small, 2 workers) starts around $0.49/hour or $360/month — the cost runs whether or not any DAG executes. Larger environments scale up to mw1.xlarge with auto-scaling workers up to a configured maximum.

Step Functions vs MWAA — Cost Curve Comparison

Step Functions Standard: roughly $0 baseline, $25 per million transitions. A pipeline with 1000 transitions per day costs ~$0.75/month. MWAA: $360+ baseline regardless of usage. The crossover is around 14 million transitions per month — below that, Step Functions wins on cost; above that, MWAA wins because per-transition cost flat-lines.

When MWAA Is Right

When the team has existing Airflow expertise or DAGs to port from elsewhere, when the workload involves dozens of complex DAGs with many tasks each, when the team needs Airflow-specific features (XCom data passing, sensors with complex polling logic, the broad ecosystem of community operators), or when portability across cloud providers matters strategically.

When MWAA Is Wrong

For simple linear workflows with a handful of steps, for ad hoc orchestration where the continuous baseline cost is wasteful, for teams without Airflow experience starting fresh on AWS — Step Functions is far simpler.

MWAA carries a continuous environment cost of $360+ per month even when no DAGs execute, while Step Functions has zero baseline cost — the choice depends on workflow complexity, team Airflow expertise, and total transition volume, not on which is "more powerful." The DEA-C01 exam plants this as the canonical orchestration cost trap: a scenario describes a small data team needing to run five daily ETL pipelines and asks for the most cost-effective orchestrator. Wrong answer: MWAA (cited because Airflow is "industry standard"). Right answer: Step Functions Standard, which costs cents per month at five-pipeline volume. The reverse trap: a team running 50+ DAGs with complex Python logic, sensors, and XCom data passing is described, and Step Functions is suggested for "simplicity" — wrong, MWAA is the right answer because the workload exceeds Step Functions' practical complexity envelope. Read the scenario for "small team, simple linear workflows" (Step Functions) versus "large team, existing Airflow DAGs, complex Python orchestration" (MWAA).

Amazon EventBridge — Rules, Pipes, And Scheduler

EventBridge is the event-routing service that complements orchestrators rather than competing with them.

EventBridge Rules

EventBridge Rules match events arriving on an event bus and route them to targets — Lambda, Step Functions, Glue, SNS, SQS, Kinesis. Use case: react to S3 object creation, DynamoDB Streams events, custom application events. Rules are pattern-matched ({"source": ["aws.s3"], "detail-type": ["Object Created"]}) and route to multiple targets.

EventBridge Pipes

Pipes are point-to-point event integrations that connect a source (DynamoDB Stream, Kinesis Stream, MSK, SQS) to a target (Step Functions, Lambda, Firehose, SNS) with optional filtering and enrichment. Pipes simplify common patterns that previously required Lambda glue code.

EventBridge Scheduler

EventBridge Scheduler is the cron-replacement service for invoking AWS APIs on a schedule. Define a schedule (cron expression or rate expression), choose a target (Step Functions execution, Lambda, Glue job, EMR Step), and Scheduler invokes the target reliably. Replaces the older CloudWatch Events Rules with cron schedules and adds: per-schedule IAM roles, time zone support, one-time and recurring schedules, dead-letter queues for failed invocations.

Scheduler vs Rules — When To Use Each

Use Rules to react to events arriving on the event bus (event-driven). Use Scheduler to invoke targets on a time-based schedule (time-driven). Both can target Step Functions, but Scheduler is the right answer for "trigger this pipeline daily at 02:00."

EventBridge As An Orchestration Glue

The canonical pattern: EventBridge Scheduler triggers a Step Functions workflow nightly; the Step Functions workflow runs a Glue ETL job, waits for it, runs a Redshift COPY, validates, and notifies. Each tool does its job — Scheduler initiates, Step Functions orchestrates, Glue and Redshift execute.

Step Functions Integration With Data Services

The DEA-C01 exam tests how Step Functions wires up Glue, EMR, Redshift, Athena, and Lambda for ETL pipelines.

Glue ETL Job

Direct integration: a Step Functions Task state with the glue:startJobRun.sync resource starts a Glue job and waits for completion before transitioning. Returns job run details to the next state. The .sync suffix is the synchronous (wait-for-completion) variant.

EMR Cluster And Steps

Direct integration for EMR — emr:createCluster.sync, emr:addStep.sync, emr:terminateCluster.sync. A Step Functions workflow can spin up an EMR cluster, submit Spark steps, wait for completion, and terminate the cluster — replacing dozens of lines of Python orchestration code.

Redshift Data API

Direct integration via aws-sdk:redshiftdata:executeStatement.sync. The Redshift Data API is asynchronous (no JDBC connection), making it the right pattern for orchestrators that don't want to maintain database connections.

Athena Query

Direct integration via athena:startQueryExecution.sync waits for the Athena query to complete and returns the results location.

Lambda

Direct integration via lambda:invoke runs a Lambda function and passes the response to the next state. The simplest integration; appropriate for custom logic not covered by direct integrations.

Use Step Functions direct service integrations (the .sync suffix) instead of writing Lambda functions to start Glue/EMR/Redshift/Athena jobs — direct integrations cost less, run faster, and have no Lambda timeout limit. Direct integrations let Step Functions call AWS APIs natively without a Lambda intermediary; the .sync variant blocks the workflow until the called service finishes. For Glue ETL job that runs 30 minutes, the direct .sync integration handles the wait without consuming Lambda compute time (which would hit the 15-minute Lambda timeout anyway). The DEA-C01 exam plants this as the right answer for "orchestrate a multi-step ETL pipeline efficiently" — never pick "Lambda function that polls Glue job status" or "Lambda function that submits Spark steps" when direct service integrations are an option. Lambda is the right tool for custom transformation logic, not for orchestrating other AWS services.

Glue Workflows vs Step Functions — The Mini-Orchestrator

Glue Workflows is a lightweight orchestrator built into AWS Glue itself.

What Glue Workflows Does

Coordinates Glue jobs, crawlers, and triggers within a single Glue workflow. Visual editor, conditional branching based on job state, no external orchestrator needed. Use case: simple multi-step Glue-only pipelines (crawl → ETL → crawl).

When Glue Workflows Wins

For pipelines that are 100 percent Glue (no EMR, no Redshift COPY, no Lambda), Glue Workflows is simpler than Step Functions and incurs no additional cost beyond the Glue jobs.

When Step Functions Wins

For pipelines that span multiple AWS services (Glue + Redshift + Lambda + EMR), Step Functions is the right answer — its direct integrations cover all of them and the orchestration logic is centralized.

DEA-C01 Decision

The exam plants this with scenario detail: "Glue-only pipeline with two crawlers and three jobs" => Glue Workflows; "ETL spanning Glue, EMR, Redshift COPY, and Lambda validation" => Step Functions.

Common Exam Traps For Orchestration

Memorize all five.

Trap 1 — MWAA For Simple Linear Pipelines

A scenario describes five daily ETL steps and asks for the most cost-effective orchestrator. Wrong answer: MWAA (because Airflow is "industry standard"). Right answer: Step Functions Standard — zero baseline cost, simple state machine.

Trap 2 — Step Functions Standard For 10M Short Executions

A scenario describes 10 million short-duration workflows per month for IoT event processing. Wrong answer: Step Functions Standard. Right answer: Step Functions Express — 100x cheaper at high TPS for sub-5-minute executions.

Trap 3 — Lambda Polling For Glue Job Status

A scenario describes orchestrating a 30-minute Glue job. Wrong answer: Lambda function that polls getJobRun until complete (Lambda has 15-minute timeout). Right answer: Step Functions direct integration with glue:startJobRun.sync.

Trap 4 — EventBridge Rules For Time-Based Schedules

A scenario describes "trigger this pipeline daily at 02:00 in Asia/Tokyo." Wrong answer: EventBridge Rules with cron expression. Right answer: EventBridge Scheduler — supports time zones, dead-letter queues, per-schedule IAM roles.

Trap 5 — MWAA Without Airflow Expertise

A scenario describes a small data team with no Airflow experience starting fresh on AWS. Wrong answer: MWAA (because "DAGs are powerful"). Right answer: Step Functions — the team learns one AWS-native tool instead of Airflow plus AWS.

Step Functions Workflow Examples

Example 1 — Daily ETL Pipeline (Standard)

EventBridge Scheduler (cron: 0 2 * * *) →
  Step Functions Standard:
    Task: Glue crawler (raw zone)
    Choice: did the crawler succeed?
      Yes → Task: Glue ETL (raw → curated)
      No  → Task: SNS notify on-call, Fail
    Task: Redshift COPY from curated S3 to staging table
    Task: Redshift stored procedure (merge to fact table)
    Task: Athena CTAS to refresh dashboard zone
    Success

Example 2 — IoT Event Processing (Express)

IoT Rule → EventBridge → Step Functions Express:
    Task: validate event schema (Lambda)
    Choice: valid?
      Yes → Task: enrich with reference data (DynamoDB GetItem)
              Task: write enriched event to Firehose
              Success
      No  → Task: route to dead-letter SQS
              Fail

Example 3 — Map State Fan-Out

Step Functions Standard:
    Task: list S3 prefixes (Lambda returns array)
    Map (MaxConcurrency 10):
      ItemProcessor:
        Task: Glue ETL on this prefix (.sync)
        Task: write completion marker to DynamoDB
    Task: aggregate results
    Success

Plain-Language Explanation: Pipeline Orchestration

Three concrete analogies make the orchestrator choice intuitive.

Analogy 1 — The Restaurant Kitchen With Different Coordination Models

Step Functions is the head chef who calls out orders one at a time and watches each station finish before calling the next: "fire the salad, when ready fire the entree, when ready plate and send." Simple, lean, the kitchen has zero overhead when no orders are in. Each call is one transition, the bill is per-call. Express variant is the same head chef running a high-volume diner where orders are simple and fast — burgers, fries, sodas — and the bill switches to per-order rather than per-step because the steps are too cheap to count. MWAA is hiring an entire executive sous-chef team with their own management hierarchy (Airflow scheduler, webserver, workers) — they sit in the kitchen all day whether or not there are orders, but they handle a complex multi-course tasting menu with 50 simultaneous tickets, kitchen-wide coordination, and multi-restaurant logistics that the head chef alone cannot. EventBridge Scheduler is the daily prep alarm: 06:00 it rings and tells the kitchen "start morning prep" without any logic of its own. EventBridge Rules is the doorbell that rings whenever a delivery truck arrives, routing the alert to the appropriate station. The DEA-C01 trap is hiring the executive sous-chef team for a five-table family diner (MWAA for simple linear pipelines) or asking the head chef to coordinate a 200-cover catering event single-handed (Step Functions for genuinely complex multi-team orchestration with Airflow expertise on staff).

Analogy 2 — The Library Acquisitions Workflow

Step Functions is the library's task list app: "step one, receive new books from donor; step two, catalog each one; step three, label and shelve; step four, update online catalog." Each step is logged with a timestamp, retries are handled by the app, and parallel tasks (cataloging multiple books simultaneously) are spawned via Map. Standard is the durable task list with multi-day history; Express is the lightweight version for high-volume one-off tasks like "scan barcode and add to inventory." MWAA is the library's full library management system (LMS) — a continuously running enterprise platform with circulation, acquisitions, cataloging, ILL, and reporting modules, requiring a librarian trained on the LMS. The LMS handles the entire library's complex multi-departmental workflows but costs continuously whether or not new books arrive. EventBridge Scheduler is the daily 09:00 reminder to check the donation drop box; EventBridge Rules is the alert when a book is returned overdue. The right choice depends on library size and complexity: a five-person community library uses the task list app (Step Functions); a university research library with 50 librarians uses the LMS (MWAA).

Analogy 3 — The Postal Sorting Facility With Manual Routing Versus Industrial Automation

Step Functions is the manual postal sorting clerk who follows a written procedure: "open bag, scan first envelope, route to appropriate bin, repeat until bag empty." The procedure is concise, the clerk is paid per envelope sorted, and the facility incurs no cost when no mail arrives. Standard is the day-shift clerk who handles the full overnight bag with multi-hour audit log; Express is the high-speed clerk for the post-event mail rush handling thousands of envelopes per minute under simpler routing rules. MWAA is the industrial postal automation hall — conveyor belts, OCR readers, robotic arms, supervisor stations — which handles a hundred thousand packages per hour with complex routing across sister facilities, but the equipment runs continuously costing $360+ per month even on quiet weekends. EventBridge Scheduler is the timer that opens the receiving dock at 06:00; EventBridge Rules is the notification that fires when a package arrives marked "fragile, redirect to special handling." The DEA-C01 trap is buying the industrial automation hall for a village post office handling 200 packages a day (MWAA for simple workflows), or asking the manual clerk to handle the regional sorting hub at peak Christmas season (Step Functions for genuine 50-DAG enterprise workloads).

Key Numbers And Must-Memorize Facts

Step Functions Standard

  • $25 per million state transitions
  • Up to 1-year execution duration
  • Full execution history for 90 days
  • At-most-once execution semantics
  • Direct service integrations with .sync support

Step Functions Express

  • ~$1 per million executions + per-GB-second
  • Max 5-minute duration per execution
  • History to CloudWatch Logs only
  • At-least-once execution semantics
  • Best for high-TPS short workflows

MWAA

  • Continuous environment cost: $360+/month minimum (mw1.small)
  • Larger sizes: mw1.medium ~$700/month, mw1.large ~$1300/month
  • Workers auto-scale within configured min/max
  • Apache Airflow 2.x supported versions
  • DAGs uploaded to S3 bucket

EventBridge

  • Rules: free, $1 per million matched events delivered
  • Pipes: per-event cost similar to Rules
  • Scheduler: free for first 14M invocations/month, $1 per million after
  • Custom event bus retention: up to 1 year

Step Functions Map State

  • Standard Map: up to 40 concurrent inline executions
  • Distributed Map: up to 10K concurrent child executions
  • Distributed Map can read input from S3

Memorize Step Functions error handling: Retry handles transient failures with exponential backoff and max attempts; Catch routes specific error codes to fallback states; both can be declared per-Task state. A typical pattern: Retry: [{ErrorEquals: ["States.TaskFailed"], IntervalSeconds: 2, MaxAttempts: 3, BackoffRate: 2.0}] retries up to three times with 2s, 4s, 8s waits. Catch: [{ErrorEquals: ["States.ALL"], Next: "NotifyOpsTeam"}] routes any unhandled error to a notification state. The DEA-C01 exam plants resilient-pipeline scenarios — the right answer always involves Retry on transient errors (throttling, network blips) and Catch on permanent errors (invalid input, schema mismatch). Without Retry/Catch, a single transient failure cascades into a failed pipeline and a 03:00 page; with them, the pipeline self-heals and the on-call sleeps. Memorize the JSON shape; it may appear in code-recognition exam questions.

DEA-C01 exam priority — Step Functions, MWAA, and EventBridge Orchestration. This topic carries weight on the DEA-C01 exam. Master the trade-offs, decision boundaries, and the cost/performance triggers each AWS service exposes — the exam will test scenarios that hinge on knowing which service is the wrong answer, not just which is right.

Definition — Step Functions, MWAA, and EventBridge Orchestration. This DEA-C01 topic covers a domain-specific AWS service or pattern. Confirm the canonical definition from official AWS documentation before relying on third-party summaries — service names and feature scoping have shifted over time.

FAQ — Step Functions, MWAA, And EventBridge Top Questions

Q1 — When should I choose Step Functions versus MWAA for a new data pipeline?

Choose Step Functions when the pipeline is moderately complex (under 20 steps), the team has no existing Airflow expertise, the workload is bursty rather than continuous, and per-transition cost is preferable to a continuous baseline. Choose MWAA when the team has existing Airflow DAGs to port, when the workload involves dozens of complex DAGs with Python logic and Airflow-specific features (XCom, sensors, the broad operator ecosystem), and when the per-month baseline cost is justified by the workload volume. The cost crossover is roughly 14M transitions per month — below that, Step Functions wins on cost; above that, MWAA's flat rate becomes economical. The DEA-C01 exam plants this with explicit cost questions: "small team, simple linear workflows, cost-sensitive" => Step Functions; "data engineering platform with 50+ DAGs, existing Airflow code, Python-heavy logic" => MWAA. Avoid MWAA for scenarios where simplicity and low baseline cost matter; avoid Step Functions for genuinely complex Airflow workloads.

Q2 — What is the difference between Step Functions Standard and Express workflows?

Standard workflows are at-most-once executions with durable state for up to one year, full execution history visible in the console for 90 days, and per-transition billing at $25 per million transitions. Express workflows are at-least-once executions with max 5-minute duration, history written to CloudWatch Logs only, and per-execution billing similar to Lambda. Use Standard for ETL pipelines, multi-step business processes, and any workflow needing audit history or running longer than 5 minutes. Use Express for IoT event processing, API request orchestration, and high-TPS micro-batches under 5 minutes — Express is roughly 100x cheaper than Standard at high volume. The DEA-C01 exam plants this as a cost calculation: "10 million short executions per month" => Express. "Daily 4-hour multi-step ETL with audit log" => Standard. Picking Standard for the 10M case wastes money; picking Express for the 4-hour case is impossible because Express dies at 5 minutes.

Q3 — How do Step Functions direct service integrations replace Lambda glue code?

Step Functions has 200+ direct integrations with AWS services (Glue, EMR, Athena, Redshift Data API, DynamoDB, S3, SNS, SQS, EventBridge, etc.) that let you call the service's API natively from a Task state without writing a Lambda function. The .sync suffix variant blocks the workflow until the called service completes — perfect for orchestrating long-running Glue or EMR jobs that exceed Lambda's 15-minute timeout. Direct integrations cost less (no Lambda invocation), run faster (no Lambda cold start), and reduce operational surface area (no Lambda code to maintain). The DEA-C01 exam plants Lambda-as-glue as a wrong-answer pattern; the right answer is direct integration with .sync for any AWS service Step Functions natively integrates with. Lambda is the right tool for custom transformation logic, not for orchestrating other AWS services.

Q4 — When should I use EventBridge Scheduler versus EventBridge Rules with cron expressions?

EventBridge Scheduler is the modern dedicated scheduling service with features that EventBridge Rules with cron lack: time zone support (cron in any time zone, not just UTC), per-schedule IAM roles (each schedule can assume a different role), one-time schedules in addition to recurring, dead-letter queues for failed invocations, and higher scale (millions of schedules per region). Use Scheduler for any new time-based scheduling needs. Use EventBridge Rules with event patterns for event-driven routing — reacting to S3 object creation, DynamoDB Streams events, custom application events. The two are not interchangeable: Scheduler is time-driven, Rules is event-driven. The DEA-C01 exam plants Scheduler as the right answer for "trigger pipeline daily at 02:00 Asia/Tokyo" or "one-time invocation at a specific timestamp." Rules is the right answer for "react to file landing in S3."

Q5 — How do I handle errors and retries in a Step Functions workflow?

Each Task state can declare Retry and Catch blocks. Retry specifies which error codes trigger a retry, with what interval and exponential backoff: Retry: [{ErrorEquals: ["States.TaskFailed"], IntervalSeconds: 2, MaxAttempts: 3, BackoffRate: 2.0}] retries up to three times with 2s, 4s, 8s waits. Catch specifies which error codes route to a fallback state: Catch: [{ErrorEquals: ["States.ALL"], Next: "NotifyOpsTeam", ResultPath: "$.error"}] routes any uncaught error to a notification state with the error details preserved. Combine them — Retry first for transient failures, Catch for permanent failures that exhaust retries. Use specific error codes (Glue.AWSGlueException, States.Timeout) rather than States.ALL for fine-grained handling. The DEA-C01 exam plants this with resilience scenarios — the right answer always shows Retry on transient errors and Catch on permanent errors.

Q6 — How does the Step Functions Map state work for parallel processing?

The Map state runs the same set of states in parallel for each item in an input array. The standard Map runs up to 40 concurrent inline executions per workflow. The Distributed Map (newer variant) runs up to 10,000 concurrent child executions, can read input from S3 (CSV or JSON files), and is the right answer for massive parallelism. Pattern: pass an array of S3 keys as input to a Map state, the Map iterates over each key in parallel running a Glue ETL job per key, and aggregates results into an output array. The MaxConcurrency parameter caps parallelism (default 0 = unlimited; set explicitly in production to avoid overwhelming downstream services). The DEA-C01 exam plants Map as the right answer for "process 1000 files in parallel without writing a custom orchestration loop."

Q7 — Can I use Glue Workflows instead of Step Functions for an ETL pipeline?

Glue Workflows is a lightweight orchestrator built into AWS Glue that coordinates Glue jobs, crawlers, and triggers. Use Glue Workflows for pipelines that are 100 percent Glue — multiple crawlers, multiple ETL jobs, conditional branching based on job state. It is simpler than Step Functions and incurs no additional cost beyond the Glue jobs themselves. Use Step Functions for pipelines that span multiple services (Glue + Redshift + Lambda + EMR + Athena) — Step Functions' 200+ direct integrations cover all of them, and the orchestration logic is centralized in one workflow. The decision: 100% Glue => Glue Workflows; multi-service => Step Functions. The DEA-C01 exam plants this with scenario detail. For most production data engineering teams, Step Functions is the broader-applicability choice; Glue Workflows is a niche tool for Glue-only pipelines.

Further Reading — Official AWS Documentation

The authoritative AWS sources are the AWS Step Functions Developer Guide (state types, error handling, direct integrations, Standard vs Express), the Amazon MWAA User Guide (environment configuration, DAG management, worker scaling), the Amazon EventBridge User Guide (Rules, Pipes, Scheduler), and the AWS Big Data Blog series on orchestration patterns at companies like FINRA, Capital One, and Amazon Music. The AWS Samples GitHub repository contains end-to-end sample workflows combining Step Functions with Glue, EMR, Redshift, and Athena. The Skill Builder DEA-C01 Exam Prep Standard Course has dedicated modules for Domain 1 Task 1.3 covering pipeline orchestration. For deeper Airflow content, the open-source Apache Airflow documentation and the AWS MWAA blog series cover migration patterns from self-hosted Airflow to MWAA. The AWS Well-Architected Data Analytics Lens covers orchestration as part of the analytics phase with explicit cost-versus-complexity guidance.

Official sources

More DEA-C01 topics