SageMaker Pipelines and ML Workflow Orchestration - MLA-C01 ML Engineer Study Notes

SageMaker Pipelines is the orchestration engine MLA-C01 expects every ML Engineer to know cold — it is the AWS-native answer to "how do I automate the train → evaluate → register → deploy loop with reproducibility, lineage, and conditional branching." The exam tests it heavily because it is also the surface where ordering and matching question types appear: stems present a list of pipeline steps and ask candidates to order them correctly (Processing → Training → Evaluation → Condition → Register → Deploy), or to match a step type to its purpose (TrainingStep → fits estimator, ConditionStep → branches on metric, RegisterModel → adds to registry). Ordering and matching require ALL items correct for credit — partial credit is not awarded. Memorizing the canonical step sequence is mandatory.

This guide covers the full pipeline step taxonomy, parameters and caching mechanics, dependency graph design, integration with Data Wrangler and Feature Store, the EventBridge trigger pattern for retraining loops, the Step Functions vs Pipelines vs MWAA decision matrix, and lineage tracking. It is written for the MLOps perspective — what an ML Engineer actually configures, the ordering question patterns the exam loves, and the precise distinctions between Pipelines, Step Functions, and Airflow that the MLA-C01 tests with nuance.

What Is SageMaker Pipelines and Why It Matters For MLOps

SageMaker Pipelines is a managed orchestration service for ML workflows, expressed as a directed acyclic graph (DAG) of typed steps that run on SageMaker infrastructure. A pipeline definition is a Python SDK declaration (or JSON) of steps, dependencies, parameters, and condition logic; a pipeline execution is one run with concrete parameter values. Pipelines provide built-in step caching, lineage tracking, parameterisation, and conditional branching — features that distinguish ML workflows from generic ETL.

Why Pipelines Beats Hand-Wired Lambdas And Step Functions

Generic orchestrators (Step Functions, Lambda chains) require manual wiring of every SageMaker API call: CreateProcessingJob, CreateTrainingJob, CreateTransformJob, RegisterModel, etc. Each call has its own input contract, output schema, and failure handling. SageMaker Pipelines wraps these calls in typed step abstractions (ProcessingStep, TrainingStep, TransformStep, RegisterModel) with automatic input/output piping, automatic retries, and built-in lineage. For ML workflows that are 80 percent SageMaker calls, Pipelines is the right abstraction; for workflows where SageMaker is one component among many non-AWS services, Step Functions or MWAA is more flexible.

The Canonical Retraining Loop

Every MLA-C01 retraining-pipeline question reduces to this canonical sequence: trigger fires → ProcessingStep prepares data → TrainingStep fits the model → ProcessingStep evaluates on test data → ConditionStep branches on the evaluation metric → RegisterModel adds to the registry (or FailStep if metric is below threshold) → approval event triggers deployment. Memorize this; ordering questions slot directly into it.

Plain-Language Explanation: SageMaker Pipelines

The pipeline abstraction feels mechanical until you map it to physical workflows people actually run.

Analogy 1 — The Restaurant Kitchen Workflow

Imagine a restaurant's daily prep workflow. Every morning the kitchen executes a fixed sequence: receive ingredients (data ingestion), wash and chop (preprocessing), cook the dishes (training), taste-test against a recipe standard (evaluation), then either plate for service (register the dish, approve it) or send it back to the chef if the taste test fails (FailStep). SageMaker Pipelines is the printed workflow on the kitchen wall — every step is named, the inputs and outputs of each step are defined, and the kitchen can re-run the workflow tomorrow with different ingredients (parameterised) but the same logic. ConditionStep is the taste-test gate: if the dish hits the standard, it goes out for service (RegisterModel); if it does not, it gets thrown back. Pipeline parameters are the configurable knobs — today's salt amount, today's cooking time, today's portion size. Step caching is "we already chopped these onions an hour ago, do not re-chop unless the recipe changed." Pipeline triggers are when the workflow starts: scheduled (every morning at 6 AM), event-driven (the new ingredient delivery just arrived), or manual (the chef hits the button). Lineage tracking is the food-safety logbook recording exactly which ingredient batch went into which dish on which day. The MLA-C01 exam asks you to order the kitchen steps in the correct sequence — get one wrong and you have raw onions in the customer's plate.

Analogy 2 — The Factory Production Line

Picture a factory making cars. The production line has fixed stations: stamp the body panels (data preparation), weld the chassis (model training), paint and inspect (evaluation), pass quality control (ConditionStep — pass to shipping or fail to rework), final assembly (RegisterModel), and ship (deploy). SageMaker Pipelines is the factory's master schedule defining which station feeds which, what the inputs and outputs of each station are, and which station runs in parallel with which others. Pipeline parameters are the model and trim level being built today — same line, different configurations. Step caching is the "we already stamped these panels in batch 47, reuse them" optimisation. Pipeline triggers are the assembly-line start signal — clock time, parts-availability event, or supervisor button. Lineage tracking is the build sheet recording every station's output for every car (VIN to part-number provenance). FailStep is the QC reject path — if the car fails inspection, it is pulled off the line and the failure is recorded in the lineage trail. The MLA-C01 ordering question maps to "give the factory schedule in the correct sequence" — stamp → weld → paint → inspect → assemble → ship; rearrange and the line crashes.

Analogy 3 — The Movie Production Pipeline

Picture a film studio's production schedule. Steps run in fixed sequence: write the script (data preparation), film the scenes (training), edit and review (evaluation), test-screening with a focus group (ConditionStep — release if scores high, reshoot if low), distribute to theatres (RegisterModel and approve), advertise (deploy to endpoints). SageMaker Pipelines is the studio's production plan. ProcessingStep is any prep or post-prep stage where you transform raw input into refined output (script → screenplay, dailies → edited scene). TrainingStep is the filming itself — the expensive creative production. TuningStep is the rehearsal phase trying multiple cuts to find the best edit (hyperparameter tuning). TransformStep is the offline batch render of every scene to final format. Pipeline parameters are the budget, runtime, and target rating today — same production process, different parameters per film. Step caching is "we already shot the establishing shots in pre-production, reuse the footage." EventBridge triggers are "test-screening finished and the focus group score is in — start the next pipeline run with the script revisions." Step Functions, by comparison, is a more general project-management tool — useful when the workflow includes non-studio steps like vendor coordination, logistics, and travel; SageMaker Pipelines is the right answer when the workflow is mostly studio production. The MLA-C01 exam asks "which orchestrator do you choose" by signalling whether the workflow is mostly SageMaker (Pipelines) or mostly multi-service (Step Functions) or mostly hybrid/legacy with existing Airflow expertise (MWAA).

Pipeline Step Types — The Complete Taxonomy

Every SageMaker pipeline step has a typed abstraction. Memorize the names and purposes — ordering and matching questions slot directly into these.

ProcessingStep

Wraps a SageMaker Processing job. Used for any data transformation: preprocessing raw data, feature engineering, post-training evaluation against a held-out test set, model fairness analysis with Clarify, post-deployment evaluation reports. Inputs are S3 paths; outputs are S3 paths. Common containers: Scikit-Learn, Spark, custom Docker image.

TrainingStep

Wraps a SageMaker Training job. Takes an Estimator object (built-in algorithm or custom container or framework), hyperparameters, instance type/count, and S3 input channels. Output is a model artifact in S3.

TuningStep

Wraps a SageMaker Automatic Model Tuning (AMT) hyperparameter optimisation job. Takes a HyperparameterTuner object with a search strategy (Bayesian, Random, Hyperband), parameter ranges, max jobs, and parallelism. Output is the best training job's model artifact.

TransformStep

Wraps a Batch Transform job. Used for batch inference within a pipeline — apply a trained model to a large dataset for offline scoring. Inputs are S3 input prefix and a model name; output is S3 output prefix.

CreateModelStep / RegisterModel

CreateModelStep registers a model object in SageMaker (creates the model resource referencing a training output). RegisterModel adds a model package to a Model Package Group in SageMaker Model Registry with an approval status. RegisterModel is the typical end-of-pipeline step that gates the deployment via the model registry.

ConditionStep

Branches the pipeline based on a condition expression evaluated against step properties. Most common pattern: condition on a metric from an evaluation ProcessingStep — if AUC > 0.85, RegisterModel; else FailStep. Multiple if-conditions and else-conditions can chain into nested branches.

FailStep

Explicitly fails the pipeline with a configured error message. Used in the else-branch of a ConditionStep when the model does not pass quality gates. Useful for emitting clear failure reasons to CloudWatch and EventBridge consumers.

LambdaStep

Invokes a Lambda function as a pipeline step. Used for custom logic that does not fit a SageMaker step type — sending notifications, updating external systems, calling third-party APIs, or implementing custom validation logic.

CallbackStep

Pauses the pipeline and waits for an external SQS callback before proceeding. Used for human-in-the-loop approval gates outside of the model registry approval flow.

EMRStep

Runs an EMR job step within the pipeline. Used when Spark transformations on EMR are part of the data preparation flow.

NotebookJobStep

Runs a SageMaker Notebook Job (a scheduled execution of a Jupyter notebook) as a pipeline step. Used for ad-hoc analysis or report generation as part of the pipeline.

QualityCheckStep / ClarifyCheckStep

Specialised steps for SageMaker Model Monitor baseline creation and SageMaker Clarify bias and explainability analysis. Used at training time to capture the baselines that Model Monitor will use post-deployment.

A SageMaker Pipeline is a DAG of typed steps where each step wraps a specific SageMaker API call (ProcessingStep, TrainingStep, TuningStep, TransformStep, RegisterModel, CreateModel) plus control-flow steps (ConditionStep, FailStep, CallbackStep, LambdaStep). Steps connect via input/output references — one step's output S3 path becomes another step's input — and the pipeline service automatically tracks lineage between them. Pipelines are defined declaratively in Python SDK and executed with parameter values; each execution produces a lineage record showing which step versions ran with which inputs and produced which outputs. This is the MLA-C01 expected mental model — pipelines as typed DAGs, not as imperative scripts.

Pipeline Parameters — Run-Time Configurability

Pipelines support typed parameters that are bound at execution time, not at definition time.

Parameter Types

ParameterString — string values (S3 paths, model names, environment names)
ParameterInteger — integer values (instance count, epoch count, max jobs)
ParameterFloat — float values (learning rate, threshold values)
ParameterBoolean — boolean flags (enable_caching, debug_mode)

Why Parameters Matter

A single pipeline definition serves many environments and many runs. Same DAG, different parameters. Today's run uses instance_type=ml.m5.xlarge for dev; tomorrow's production run uses instance_type=ml.m5.4xlarge. One pipeline definition, many executions, parameterised at run time.

Default Values

Each parameter can have a default. Executions without explicit parameter values use defaults. Useful for production runs that always use the same configuration.

Step Caching — Avoiding Recomputation

Pipeline executions can cache step results: if a step's inputs and parameters are unchanged from a previous successful execution, SageMaker reuses the previous output instead of re-running.

Cache Configuration

CacheConfig(enable_caching=True, expire_after="P30D") enables caching for thirty days. The cache key is a hash of step inputs, parameters, and step definition; if any change, the step re-runs.

When Caching Helps

Long pipelines where early steps (data preparation) rarely change while later steps (training, tuning) iterate frequently
Iterative debugging where rerunning the full pipeline would re-process unchanged data
Cost optimisation — skip expensive Processing or Training steps when inputs are unchanged

Cache Invalidation Trap

Cache key uses input hashes. If a Processing step's input is "all files under prefix X" and the prefix gains a new file, the hash changes and the step re-runs. But if the input is one specific S3 object that is overwritten in place with the same key (different content), AWS may not invalidate the cache because the S3 object reference is the same. Best practice: version data inputs with explicit version paths to make cache behaviour predictable.

Enable step caching by default for development pipelines and disable it for production runs of audit-sensitive workflows. Caching saves significant time and cost during iteration — when only the training step changes but data preparation has not, skipping preparation saves hours and dollars. Production retraining runs, by contrast, often want to re-execute every step regardless of cache to capture an immutable lineage trail showing every step actually ran. The configuration: development pipelines enable caching at the pipeline level with expire_after="P7D"; production retraining pipelines disable caching at the pipeline or step level. Mixing these is a common ML Engineer mistake — caching enabled in production produces lineage records pointing at older runs, confusing audit reviews.

ConditionStep — The Quality Gate

ConditionStep is where a pipeline decides whether the model is good enough to register.

Condition Expressions

Conditions evaluate against step properties. Most common: a property on an evaluation ProcessingStep's output — JsonGet("evaluate", "metrics.auc.value") > 0.85. SageMaker SDK provides ConditionGreaterThan, ConditionGreaterThanOrEqualTo, ConditionLessThan, ConditionEquals, ConditionIn, and ConditionOr/ConditionAnd combinators.

Branching Pattern

ConditionStep(
    name="check-auc",
    conditions=[ConditionGreaterThanOrEqualTo(left=auc_value, right=0.85)],
    if_steps=[register_model_step],
    else_steps=[fail_step],
)

If the condition evaluates true, the if_steps execute; otherwise, the else_steps execute. The two branches are mutually exclusive — only one runs per execution.

Why ConditionStep Matters For MLOps

Without a ConditionStep, every training run gets registered regardless of quality, polluting the model registry with bad models. With ConditionStep, only models passing the quality gate enter the registry, and bad runs are explicitly recorded as failures via FailStep with diagnostic messages.

Always include a ConditionStep gating the RegisterModel step on a quality threshold (AUC, F1, RMSE, or business metric) — never register models unconditionally. The pattern: TrainingStep produces a model → ProcessingStep runs the model on a held-out test set and writes evaluation.json → ConditionStep reads the metric from evaluation.json and gates → if metric passes threshold, RegisterModel; else FailStep with a descriptive error message. This pattern keeps the model registry clean (only good models enter), provides explicit failure records for bad runs, and integrates with EventBridge for automated alerting. Skipping the ConditionStep is the most common Pipeline anti-pattern flagged by SageMaker reference architectures and is the ML Engineer mistake the MLA-C01 exam tests with stems like "ensure only models meeting quality threshold are registered."

Pipeline Triggers — How Retraining Starts

A pipeline definition is inert until something triggers an execution.

EventBridge Schedule

Cron-style schedule fires periodically — every 6 hours, every Monday at 2 AM, the first of every month. Used for scheduled retraining cadence aligned with data refresh frequency.

EventBridge Event Pattern

Triggers on AWS service events: a new file landed in S3 (Object Created event), Model Monitor fired a violation (SageMaker Model Quality Violation event), a CodeCommit push (Reference Created event in a branch), or a custom application event published to EventBridge. Used for event-driven retraining when data changes or quality drifts.

Manual Invocation

StartPipelineExecution API call from a Lambda, a Jupyter notebook, or a CLI command. Used for ad-hoc reruns and for human-triggered iterations.

CodePipeline Trigger

CodePipeline calls SageMaker Pipelines as part of a larger CI/CD flow. Used when the pipeline is one stage in a broader continuous-delivery flow that also handles infrastructure deployment, smoke testing, and downstream system integration.

The Closed-Loop Retraining Pattern

The canonical pattern combines triggers: scheduled retraining every week as the baseline cadence; event-driven retraining when Model Monitor detects a drift violation between schedules; manual invocation for ad-hoc experiments. The combination ensures the model is regularly retrained, retrained immediately on degradation, and easy to retrain on demand.

Step Functions vs SageMaker Pipelines vs MWAA — The Decision Matrix

When does each orchestrator win?

SageMaker Pipelines

Workflow is mostly SageMaker calls (Processing, Training, Transform, Model Registry)
Want native step types, automatic lineage, built-in caching
Want native integration with Model Registry, Feature Store, Data Wrangler, Clarify
Team is SageMaker-fluent and Python-fluent
Use case: ML retraining loop, feature pipeline, evaluation pipeline

AWS Step Functions

Workflow involves many non-SageMaker steps (Lambda, ECS, Batch, third-party APIs, complex retry logic, error compensation)
Need fine-grained state-machine control with parallel branches, dynamic step generation, and complex error handling
Already standardised on Step Functions for non-ML workflows
Use case: end-to-end orchestration spanning data engineering, ML training, downstream business systems

Amazon MWAA (Managed Workflows for Apache Airflow)

Team has existing Airflow expertise and DAGs to migrate
Workflow is multi-system, including non-AWS services and custom operators
Need rich operator ecosystem (Airflow has hundreds of operators for various services)
Use case: data engineering hub orchestrating both ML and non-ML workflows

Picking The Right One On The Exam

"Mostly SageMaker calls, want native lineage" → SageMaker Pipelines
"Complex multi-service orchestration, fine-grained state machine" → Step Functions
"Existing Airflow expertise" or "managed Airflow service" → MWAA

Step Functions and SageMaker Pipelines are NOT interchangeable — they solve different problems and the MLA-C01 exam tests the distinction with nuance. Pipelines is purpose-built for ML workflows: typed steps wrapping SageMaker APIs, automatic lineage to Model Registry, native integration with Feature Store and Data Wrangler. Step Functions is general-purpose: state machines for any AWS service, including SageMaker via the SageMaker integration, but without ML-specific abstractions. Use Pipelines when the workflow is 80 percent SageMaker; use Step Functions when SageMaker is one of many components and you need fine-grained state machine control. Stems with "mostly SageMaker training and inference" → Pipelines; stems with "orchestrate Lambda, Batch, ECS, and SageMaker together with custom retry logic" → Step Functions.

Integration With Data Wrangler And Feature Store

SageMaker Pipelines integrates natively with the data preparation and feature management services.

Data Wrangler Export To Pipeline

A Data Wrangler flow can export to a SageMaker Pipeline as a ProcessingStep with the Data Wrangler Spark container. The flow is reproducible at scale — the interactive transformations defined in Data Wrangler run as a managed Processing job during pipeline execution.

Feature Store Ingestion In Pipeline

A ProcessingStep can read from S3, transform features, and call PutRecord on the Feature Store online store or write to the offline store. Subsequent training steps query Feature Store via the offline store's S3 + Athena interface, ensuring training-serving consistency.

Feature Store Reads For Training

A TrainingStep's input channel can be the Athena query result over the Feature Store offline store, materialised as a CSV or Parquet file in S3. This pattern produces a training dataset that exactly matches what is served at inference time, eliminating training-serving skew.

Lineage Tracking — Reproducibility By Default

SageMaker ML Lineage Tracking automatically records every step's inputs, outputs, parameters, and code version.

What Gets Tracked

Pipeline executions create lineage records linking: source data S3 paths → preprocessing job outputs → training job inputs and outputs → model artifacts → evaluation reports → registered model packages → deployed endpoints. Every link is queryable.

Why It Matters For Audit

Regulators and audit reviewers ask "show me which training data produced the model currently serving customer X's predictions on date Y." Without lineage, this is days of forensic detective work. With lineage, a single API call returns the answer.

Querying Lineage

ListAssociations API or the SageMaker Studio Lineage UI traverses the lineage graph. Common queries: "all training runs that used dataset version X," "all endpoints currently deployed from model package group Y," "the data source for the model serving endpoint Z right now."

SageMaker ML Lineage Tracking automatically records every pipeline step's inputs, outputs, and code version, producing a queryable graph from data sources through training to deployed endpoints. This happens automatically for SageMaker Pipelines executions — no extra configuration required. The lineage answers regulator and audit questions: "which training data produced this prediction?" "Which models were ever deployed from this dataset?" "Show me the complete history of this model package group." On the MLA-C01 exam, lineage tracking is the answer to "how do I produce an audit trail showing data-to-model provenance" or "how do I trace a production model back to its source training data." Manual provenance tracking with custom tags or metadata is not the AWS-recommended pattern — Lineage Tracking is.

The End-To-End Pipeline — Ordering And Matching Reference

The canonical retraining pipeline structure tested as ordering questions:

Ordered Step Sequence

Trigger fires (EventBridge schedule, Model Monitor violation, S3 event)
ProcessingStep — Data Preparation (Glue / Spark / Data Wrangler container)
ProcessingStep — Feature Engineering (or Data Wrangler flow)
(Optional) Feature Store ingestion via PutRecord or offline store write
TrainingStep (or TuningStep for HPO)
ProcessingStep — Evaluation (model on held-out test set, produces metrics JSON)
ConditionStep (gates on metric threshold)
If pass: RegisterModel (adds to Model Package Group with PendingManualApproval)
If fail: FailStep (with descriptive error message)
(Outside pipeline) ML Engineer reviews and approves the model package
(Outside pipeline) EventBridge on Model Package State Change triggers UpdateEndpoint with deployment guardrails

Matching Reference

Step Type	Wraps	Typical Use
ProcessingStep	Processing job	Data prep, evaluation, post-processing
TrainingStep	Training job	Fit a model
TuningStep	AMT job	Hyperparameter optimisation
TransformStep	Batch Transform	Offline batch inference
CreateModelStep	Model resource	Register model object referencing artifact
RegisterModel	Model Package	Add to Model Registry with status
ConditionStep	Branching	Gate on metric
FailStep	Failure	Explicit failure with message
LambdaStep	Lambda invoke	Custom logic outside SageMaker
CallbackStep	SQS wait	Human-in-the-loop approval

Common MLA-C01 Pipeline Traps

Trap 1 — RegisterModel Auto-Deploys

Wrong. RegisterModel only adds the model package to the registry with a status. Deployment requires a separate UpdateEndpoint call (often from CodePipeline or Lambda triggered by approval).

Trap 2 — ConditionStep Has Else-Else-Else Branches

Wrong. ConditionStep is binary: if-steps and else-steps. Multi-way branching requires nested ConditionSteps.

Trap 3 — Caching Reuses Output From Any Previous Run

Partially wrong. Cache key is a hash of step inputs, parameters, and step definition. If any of these change, the cache misses and the step re-runs.

Trap 4 — Step Functions Is Always A Better Choice

Wrong. For ML workflows that are mostly SageMaker, Pipelines wins on native integration, lineage, and registry. Step Functions wins for multi-service orchestration outside SageMaker.

Trap 5 — Pipelines Must Run In SageMaker Studio

Wrong. Pipelines runs as a managed service. Studio is a UI for visualisation. CLI, SDK, CodePipeline can all start and monitor pipeline executions.

Trap 6 — Lineage Tracking Requires Manual Configuration

Wrong. Lineage is automatic for SageMaker Pipeline executions. Manual lineage configuration is needed only for non-Pipeline SageMaker workflows.

Trap 7 — Pipeline Parameters Can Change Step Logic

Partial. Parameters change input values but not the DAG structure. Conditional branching changes structure via ConditionStep, not via parameter values.

Trap 8 — MWAA Is The AWS Default For ML

Wrong. SageMaker Pipelines is the AWS-recommended ML orchestrator. MWAA is the answer when teams have existing Airflow expertise or hybrid workflows.

FAQ — SageMaker Pipelines and ML Workflow Orchestration

Q1 — In what order should the steps of a typical SageMaker training and deployment pipeline run?

The canonical order: ProcessingStep for data preparation → ProcessingStep for feature engineering (or Data Wrangler flow) → TrainingStep (or TuningStep for HPO) → ProcessingStep for evaluation against held-out test data → ConditionStep gating on the evaluation metric → if pass, RegisterModel adding the model package to the registry with PendingManualApproval status; if fail, FailStep with a descriptive error message. Deployment to an endpoint is typically NOT part of the pipeline — instead, an external EventBridge rule listens for the Model Package State Change event when an ML Engineer approves the package, and triggers a CodePipeline or Lambda that calls UpdateEndpoint with deployment guardrails. The MLA-C01 exam asks ordering questions on this exact sequence; getting any step out of order (e.g. RegisterModel before ConditionStep) is wrong.

Q2 — When should I use a ConditionStep instead of just running every model through registration?

Always include a ConditionStep gating RegisterModel on a quality threshold. Without the gate, every training run produces a registered model package regardless of quality, polluting the model registry with bad models that approval reviewers must filter out manually. With the gate, only models passing the threshold (e.g. AUC ≥ 0.85, F1 ≥ 0.75, business metric meeting target) enter the registry, and bad runs explicitly fail via FailStep with a diagnostic message captured in the lineage record. The ConditionStep also provides an automated quality bar that prevents regressions — a retraining pipeline that produces a worse model than the current production version should fail-fast at the ConditionStep rather than producing a confused approver who must compare metrics manually.

Q3 — How does SageMaker Pipelines step caching work and when should I disable it?

Step caching reuses the output of a previous successful execution if the step's inputs, parameters, and step definition are unchanged. The cache key is a hash of these elements; if anything changes, the cache misses and the step re-runs. Configure with CacheConfig(enable_caching=True, expire_after="P30D") at the step level or pipeline level. Enable caching for development and iteration where rerunning unchanged data preparation wastes hours and dollars. Disable caching for production retraining runs that need to capture a complete lineage trail showing every step actually ran for audit purposes — caching produces lineage records pointing at older executions, confusing audit reviews. The most common bug: caching enabled in production lets a retraining pipeline silently reuse stale data preparation outputs from weeks earlier, missing recent data quality fixes.

Q4 — Step Functions or SageMaker Pipelines — when do I pick which for ML orchestration?

Pick SageMaker Pipelines when the workflow is mostly SageMaker calls (Processing, Training, Transform, Model Registry, Feature Store) — Pipelines provides typed step abstractions, automatic lineage to the Model Registry, native integration with Data Wrangler and Feature Store, and built-in step caching. Pick Step Functions when the workflow involves substantial non-SageMaker components — Lambda, ECS, Batch, third-party API calls, complex retry compensation, dynamic parallel branches, or cross-region orchestration — that benefit from Step Functions' general-purpose state-machine model. The decision is not "which is more powerful" but "which abstraction matches the workflow shape." Stems on the MLA-C01 exam phrase the answer: "mostly SageMaker training and registry" → Pipelines; "orchestrate SageMaker plus Lambda plus Batch plus DynamoDB Streams with retry compensation" → Step Functions.

Q5 — How do I trigger a SageMaker pipeline to retrain a model when production data drifts?

Configure SageMaker Model Monitor (data quality monitor or model quality monitor) on the production endpoint with a baseline and a monitoring schedule. When Model Monitor detects a violation, it emits a CloudWatch metric and an EventBridge event (SageMaker Model Quality Violation or a custom event from a Lambda processing the violation report). Configure an EventBridge rule matching this event pattern, with the target being a Lambda that calls StartPipelineExecution on the retraining pipeline with appropriate parameters (e.g. recent data window, current production model package ARN for comparison). The retraining pipeline then runs through its full DAG: data preparation → training → evaluation → ConditionStep → RegisterModel. Approval and deployment happen downstream as separate steps. The pattern: Monitor detects drift → EventBridge → Lambda → StartPipelineExecution → ConditionStep → RegisterModel → human approval → CodePipeline → UpdateEndpoint with guardrails. This closed-loop retraining is the canonical end-to-end MLOps flow tested on MLA-C01.

Q6 — Can I parameterise a SageMaker pipeline to run in dev, staging, and production with the same definition?

Yes — pipeline parameters (ParameterString, ParameterInteger, ParameterFloat, ParameterBoolean) are bound at execution time, allowing one definition to serve many environments. Define parameters for environment-specific values: instance_type (smaller instances in dev, larger in prod), instance_count (single instance in dev, multi-node in prod), s3_input_prefix (different data for each environment), model_approval_status (PendingManualApproval in prod, Approved in dev for fast iteration), and enable_caching (True in dev, False in prod). When triggering execution from CodePipeline, pass environment-specific parameter values for each stage. The same pipeline definition runs in dev with cheap instances and aggressive caching, in staging with production-like configuration, and in production with full instance counts and audit-friendly caching disabled. This pattern is the AWS-recommended way to maintain a single source of truth for pipeline logic across environments.

Q7 — How does SageMaker Pipelines integrate with the Model Registry and downstream deployment?

The integration is event-driven. The pipeline's RegisterModel step adds a model package to a Model Package Group with a configured initial status (typically PendingManualApproval for production, Approved for development). The status change emits a SageMaker Model Package State Change event to EventBridge. Downstream automation listens for this event: a CodePipeline pipeline or a Lambda function triggered by the event can call UpdateEndpoint with a DeploymentConfig containing blue/green policy, canary traffic shift, and AutoRollbackConfiguration. The deployment proceeds with auto-rollback on CloudWatch alarms, and on success the new model package becomes the deployed production version. The flow is: Pipeline (train, evaluate, gate, register) → Registry (PendingManualApproval) → human approval → State Change event → EventBridge → CodePipeline/Lambda → UpdateEndpoint with guardrails → success or rollback. The MLA-C01 exam tests every piece of this chain; the most-cited mistake is forgetting that RegisterModel does NOT auto-deploy — deployment is a separate downstream step driven by the approval event.