ML CI/CD with CodePipeline and Automated Retraining

ML CI/CD with CodePipeline and automated retraining is the operational heartbeat of any production machine learning system on AWS. ML CI/CD is the discipline of treating model code, training data, hyperparameters, and inference infrastructure as software artifacts that flow through automated pipelines from a Git commit or a data-drift alarm all the way to a production endpoint with the same rigor that backend services receive. On the MLA-C01 exam, ML CI/CD anchors Task 3.3 of Domain 3 (Deployment and Orchestration of ML Workflows, 22 percent weight) and is one of the most heavily-tested topic clusters because the certification is positioned as an engineering exam, not a data-science exam — community signals from K21 Academy, Pluralsight, and Tutorials Dojo all confirm that candidates who under-prepare on ML CI/CD pipeline configuration depth fail more often than candidates who under-prepare on algorithm theory.

This guide covers the ML CI/CD landscape end to end: how AWS CodePipeline stages map to ML lifecycles, how CodeBuild executes unit tests and packages SageMaker training scripts, how EventBridge triggers automated retraining from data-quality alarms or scheduled cron, how model approval gates work, how Infrastructure-as-Code with CloudFormation or CDK provisions the entire pipeline, and how SageMaker Pipelines and CodePipeline play complementary roles rather than alternatives. It closes with the canonical retraining loop — drift detected to model deployed — and the troubleshooting decision trees the exam loves to test through ordering and matching question types.

What Is ML CI/CD and Why MLA-C01 Tests It So Heavily

ML CI/CD is the application of continuous integration and continuous delivery practices to machine learning systems. Traditional software CI/CD treats source code as the single artifact to test and deploy. ML CI/CD must additionally manage continuous training (CT) — the automated retraining of models when data distributions shift, when new labeled data arrives, when business requirements change, or when scheduled refresh windows fire. This third pillar (CI + CD + CT) is what makes ML CI/CD distinct from generic DevOps. On MLA-C01, ML CI/CD questions test whether you can map a stem like "the team wants to retrain the fraud detection model whenever the data quality monitor alarm fires and deploy the new version after analyst approval" into the right combination of EventBridge rules, Lambda triggers, SageMaker Pipelines executions, Model Registry approval workflows, and CodePipeline deployment stages.

Why Engineers From DevOps Backgrounds Find MLA-C01 Easier Than Data Scientists Do

Community signals consistently confirm that candidates with backend or DevOps backgrounds outperform pure data scientists on MLA-C01 because the exam tests CI/CD pipeline mechanics, IaC patterns, and operational troubleshooting more deeply than algorithm theory. The data scientist who can derive backpropagation by hand but has never written a buildspec.yml will struggle with stems that paste a CodePipeline JSON definition and ask what stage transitions trigger retraining. Treat ML CI/CD as a software-engineering exam topic — it rewards engineers who think in terms of stages, transitions, artifacts, and triggers.

The Three Pillars of ML CI/CD on AWS

Continuous integration covers automated unit testing of training scripts, schema validation of training data, container linting, and integration testing against a staging endpoint. Continuous delivery covers automated packaging of trained models into Model Registry, staged deployment through dev/staging/production accounts, and automated rollback on health-check failure. Continuous training covers retraining triggers (scheduled, drift-driven, or new-data-arrival), pipeline orchestration via SageMaker Pipelines, and conditional registration based on evaluation metrics. The MLA-C01 exam tests all three pillars but weights continuous training and continuous delivery more heavily than pure CI because CI patterns transfer directly from generic DevOps while CT patterns are ML-specific.

Plain-Language Explanation: ML CI/CD with CodePipeline

ML CI/CD is the kind of topic where six AWS services orchestrate one workflow. Three concrete analogies make the moving parts stick.

Analogy 1 — The Restaurant Kitchen With a Recipe Update Pipeline

Imagine a restaurant chain that updates its signature dish recipe every quarter based on customer feedback (input data). The head chef (data scientist) writes the new recipe (training script), the prep kitchen (CodeBuild) tests it on a small batch (unit tests), the test kitchen (SageMaker training job) cooks the full dish at scale, the food critic panel (model evaluation) scores it on flavor and presentation (precision and recall), the executive chef (manual approval gate in CodePipeline) tastes it before approving distribution, and the central commissary (Model Registry) stamps the approved recipe with a version number before shipping it to every restaurant location (production endpoints). When the customer-feedback hotline (Model Monitor) reports that current recipes are scoring poorly (data drift detected), the kitchen automation (EventBridge rule) automatically triggers a fresh recipe development cycle without waiting for the quarterly review. The retraining loop is the kitchen's quality control circling back from customer feedback to recipe update to commissary to restaurant to next round of feedback. Every step is logged on a recipe card (pipeline execution history) so any food safety auditor can reconstruct exactly which recipe was served on which day.

Analogy 2 — The Car Manufacturing Assembly Line With Continuous Quality Updates

Picture a car factory that produces vehicles continuously. The chassis design (model code) is checked into version control (CodeCommit or GitHub) by engineers. When a design change is committed, the QA station (CodeBuild) runs unit tests on the design — does it pass crash simulation, does it fit the standard chassis specifications. The prototype build line (SageMaker training job) assembles a full prototype using the new design plus the latest production data (training data from S3). The crash test lab (model evaluation) runs the prototype through standardized safety tests and produces metrics (accuracy, latency, fairness scores). If metrics pass the threshold, the vehicle is registered with the DOT (Model Registry approval), receives a VIN (model version), and gets shipped to dealerships (deployed to endpoints). Customer telemetry from existing vehicles on the road (Model Monitor) reports performance degradation — say tires are wearing faster than predicted in winter conditions (data drift). The factory operations center (EventBridge rule on a Model Monitor alarm) automatically schedules a new production run with the latest road data without human intervention. The factory floor manager (CodePipeline) coordinates which assembly stations run in what order, while the prototype-design and prototype-build subroutine (SageMaker Pipelines) handles the ML-specific orchestration inside the larger factory pipeline. The two systems are complementary, not alternatives.

Analogy 3 — The Newspaper Print Edition Daily Workflow

Picture a daily newspaper. The editorial team (data scientists) writes new articles (model code changes) and commits them to the manuscript repository (Git). The copy desk (CodeBuild) runs spell-check, fact-check, and style-guide validation (unit tests, schema validation). The layout team (SageMaker training job) assembles the full edition using the latest articles plus the current ad inventory (training data). The editor-in-chief (manual approval gate) reviews proofs before approving the press run. The pressroom (CodeDeploy or SageMaker UpdateEndpoint) prints the edition and distributes to subscribers (production traffic). The newsstand sales tracker (Model Monitor on the live endpoint) reports which sections are performing poorly (data drift in subscriber preferences). When sales drop on the sports section by more than 15 percent (CloudWatch alarm threshold), the newsroom dispatcher (EventBridge rule) automatically triggers an editorial review and an updated edition without waiting for the next morning's planning meeting. The version archive (Model Registry) keeps every approved edition with metadata so the librarian (audit team) can prove which edition was distributed on which day. The retraining loop is the newsroom's mechanism for continuously adapting to reader feedback at production speed.

AWS CodePipeline Stages for ML — The Five-Stage Pattern

CodePipeline orchestrates ML CI/CD using a stage-and-action model. Memorize the canonical five-stage pattern.

Stage 1 — Source

The Source stage pulls model code, training scripts, pipeline definitions, and infrastructure-as-code templates from CodeCommit, GitHub, GitHub Enterprise, Bitbucket, Amazon S3, or Amazon ECR. For ML workflows, the source typically includes the training script (train.py), the inference handler (inference.py), the SageMaker Pipeline definition (pipeline.py), the buildspec for CodeBuild (buildspec.yml), and CloudFormation or CDK templates for endpoint infrastructure. Source-stage triggers can be webhook-based (commit to main branch fires the pipeline) or polling-based (CodePipeline checks every five minutes).

Stage 2 — Build and Test

The Build stage runs CodeBuild projects that execute unit tests against the training script, validate the training data schema with Great Expectations or AWS Glue Data Quality, lint the Docker container, package the inference container into ECR, and emit build artifacts (the training job definition JSON, the SageMaker Pipeline definition JSON, CloudFormation templates) for downstream stages. The buildspec.yml defines the test commands and the artifact paths. CodeBuild runs in an ephemeral container, so every build is reproducible and isolated.

Stage 3 — Train

The Train stage either invokes SageMaker Pipelines via a Lambda action or via the SageMaker Pipelines CodePipeline action provider, which starts a pipeline execution that runs Processing → Training → Tuning → Evaluation → ConditionStep → RegisterModel. The CodePipeline stage waits for the SageMaker Pipeline execution to complete and reads the Model Registry registration as the success signal. Training stage failures (poor evaluation metric, training job failure, data validation failure) halt the pipeline at this stage.

Stage 4 — Approval

The Approval stage uses CodePipeline's manual approval action — a human (typically the ML engineering lead or a compliance officer) reviews the model card, evaluation report, and lineage data before clicking Approve in the AWS console or via SNS notification link. Manual approval is mandatory for production deployments in regulated industries (finance, healthcare, insurance) and recommended even in unregulated contexts as a final guard against runaway retraining. For lower-risk environments, automated approval via Lambda function evaluating metric thresholds replaces the manual step.

Stage 5 — Deploy

The Deploy stage updates the production SageMaker endpoint via CloudFormation stack update, CDK deployment, or direct SageMaker UpdateEndpoint API call. The deployment strategy can be all-at-once (replace the endpoint config), blue/green (parallel stack swap), canary (small percentage traffic shift), or linear (incremental traffic shifting). SageMaker Deployment Guardrails configured at the endpoint level provide automated rollback on CloudWatch alarm breach during deployment.

CodePipeline stages are different from SageMaker Pipeline steps and the two systems are complementary, not alternatives. CodePipeline orchestrates the outer software-engineering workflow — source control to build to deploy. SageMaker Pipelines orchestrates the inner ML workflow — preprocess to train to evaluate to register. A typical production architecture has CodePipeline calling a SageMaker Pipeline execution from inside its Train stage, then waiting for the SageMaker Pipeline to complete before transitioning to the Approval stage. The MLA-C01 exam will plant a stem testing this distinction — answers proposing CodePipeline as a replacement for SageMaker Pipelines or vice versa are wrong; the correct pattern uses both.

CodeBuild for ML — The Test and Package Stage

CodeBuild is the executor of the Build stage. It is where unit tests run, training data is validated, and artifacts are produced.

Buildspec.yml Anatomy for ML

A typical ML buildspec.yml has four phases: install (set up the Python environment, install dependencies via pip), pre_build (authenticate to ECR, fetch reference data for tests), build (run pytest against the training script, run Great Expectations validation against the training data, lint Dockerfiles, build and push the training container to ECR), and post_build (emit pipeline definition artifacts to S3, write build metadata to Parameter Store).

Unit Testing ML Code

Unit tests for ML scripts cover the data preprocessing functions (does the tokenizer produce expected outputs, do the feature transforms preserve dtypes), the model architecture instantiation (does the model build with the configured hyperparameters), and the evaluation metric calculations (does precision_at_k return the expected value for a known input). What you do not unit-test in CI: training convergence (too slow, requires GPU, runs in the SageMaker training stage instead) or production-scale data quality (runs in SageMaker Processing jobs).

Integration Testing With Synthetic Data

Integration tests in CodeBuild use a small synthetic dataset (1000 rows) to run an end-to-end SageMaker training job in a smaller instance type (ml.m5.large), invoke the resulting model on a held-out test row, and verify the prediction shape and confidence ranges. This catches container packaging bugs, S3 path errors, and SageMaker SDK API misuse before consuming GPU hours on full training.

Container Build and Push to ECR

For BYOC (Bring Your Own Container) workflows, the Build stage compiles a Dockerfile, runs container linting, scans for vulnerabilities (with Amazon Inspector or Trivy), tags with the commit SHA, and pushes to ECR. The downstream Train stage references the image URI by tag. For BYOS (Bring Your Own Script) workflows, the Build stage just packages the script and uses a SageMaker-managed framework container.

Caching and Build Performance

CodeBuild supports local caching (Docker layer cache) and S3 caching (pip wheels, npm modules) to reduce build times. For ML pipelines that rebuild containers on every commit, layer caching cuts build time from 8 minutes to under 90 seconds for small code changes.

Always pin your CodeBuild image to a specific version and your Python dependencies to specific versions in requirements.txt. "Latest" tags drift silently — a build that worked yesterday fails today because PyTorch released a minor version. For ML pipelines that may regenerate models months later for audit reproducibility, pinning is mandatory; auditors will ask "can you regenerate the model that was deployed on 2024-03-15" and the answer must be yes. Pin the CodeBuild image to a specific aws/codebuild/standard:7.0 tag, pin all Python dependencies, and store the pinned requirements.txt in the model package metadata in Model Registry so the dependency state at training time is permanently captured.

EventBridge Triggers for Automated Retraining

EventBridge is the trigger nexus for continuous training. Three patterns dominate the exam.

Pattern 1 — Scheduled Retraining (Cron)

The simplest retraining trigger is a schedule. EventBridge Scheduler creates a cron rule (every Sunday at 02:00 UTC, every first of the month) that targets a Lambda function or directly invokes a SageMaker Pipeline execution via the StartPipelineExecution API. Use this pattern when retraining cadence is dictated by business cycles (weekly model refresh, monthly recalibration) rather than by data quality. The trade-off: scheduled retraining wastes compute when data has not drifted and is too slow when drift accelerates suddenly.

Pattern 2 — Drift-Driven Retraining (Model Monitor Alarm)

The exam-favored pattern. SageMaker Model Monitor publishes CloudWatch metrics for data quality violations, model quality violations, bias drift, and feature attribution drift. A CloudWatch alarm on the violation metric crossing a threshold publishes to an SNS topic or directly to EventBridge as a state-change event. An EventBridge rule with the alarm-state pattern targets a Lambda function that starts a SageMaker Pipeline execution with parameters indicating the violation type. This closes the loop: production model drifts → Monitor detects → alarm fires → pipeline retrains → registry registers → CodePipeline deploys.

Pattern 3 — New-Data-Arrival Retraining (S3 Event)

When fresh labeled data arrives in S3 (from a labeling pipeline, from a daily batch ETL, from upstream data partner), an S3 PutObject event triggers a Lambda that decides whether the new batch warrants retraining (typically: minimum row count threshold plus minimum elapsed time since last training). If yes, the Lambda starts a SageMaker Pipeline execution. This pattern fits use cases where retraining frequency is data-driven rather than time-driven.

EventBridge Rule Patterns for SageMaker Events

EventBridge can also consume events emitted by SageMaker — Pipeline state changes (Started, Succeeded, Failed), Training Job state changes, Model Registry status changes (PendingManualApproval, Approved, Rejected). A common pattern: EventBridge rule on Model Package State Change with ModelApprovalStatus = Approved fires a Lambda that updates the production endpoint. This decouples approval from deployment — approving in Model Registry automatically propagates to production without anyone needing to click in CodePipeline.

SageMaker emits structured events to EventBridge for pipeline state changes, training job state changes, model registration, and approval status changes — and these events are the glue for closing the retraining loop without writing custom polling code. The naive implementation polls the SageMaker Describe APIs every minute waiting for state changes, which is wasteful and rate-limited. The correct implementation creates EventBridge rules that match specific SageMaker events and target Lambda functions or Step Functions executions for follow-up actions. For the MLA-C01 exam, any stem asking "how do we trigger X when SageMaker Y completes" expects an EventBridge rule answer; polling-based answers are wrong.

Model Approval Gates — Manual and Automated

Approval gates are the safety guard between trained model and production deployment.

Manual Approval in CodePipeline

The manual approval action pauses the pipeline indefinitely until a human approves or rejects via the AWS console, CLI, or SDK. Configure the action with an SNS topic that emails the approver with a link to the pipeline state and the relevant build artifacts (model card, evaluation report, lineage). The approval action does not have a built-in timeout — pending approvals stay pending until acted on, which is by design (you do not want a rogue auto-approval after 24 hours of analyst absence).

Automated Approval via Lambda

For lower-risk environments, replace manual approval with a Lambda function that reads the Model Registry's evaluation metrics, compares against thresholds (precision > 0.85, latency p99 < 200ms, fairness ΔDPL < 0.1), and either calls UpdateModelPackage to set ModelApprovalStatus to Approved or rejects with an explanation. The Lambda can also implement promotion logic — auto-approve for staging, require manual for production.

Multi-Tier Approval Workflows

Regulated industries often require multi-tier approval: ML engineer signs off on technical metrics, data scientist signs off on fairness, compliance officer signs off on regulatory criteria. CodePipeline supports sequential manual approval actions in separate stages, each with its own SNS topic and approval condition. Pipeline state moves forward only when every approval is completed.

Approval Gate Placement Anti-Patterns

A common exam-trap configuration places the approval gate before training instead of after evaluation. This is wrong — approving a model before knowing its evaluation metrics defeats the purpose of the gate. The correct placement is between evaluation and deployment, with the approval payload showing evaluation results.

Infrastructure-as-Code for ML CI/CD

ML CI/CD pipelines themselves should be defined as code, not clicked in the console.

CloudFormation for SageMaker Resources

CloudFormation supports SageMaker resources: AWS::SageMaker::Model, AWS::SageMaker::Endpoint, AWS::SageMaker::EndpointConfig, AWS::SageMaker::Pipeline, AWS::SageMaker::ModelPackage, AWS::SageMaker::ModelPackageGroup. A CloudFormation stack defines the entire pipeline — the SageMaker Pipeline definition, the Model Registry group, the EventBridge rules, the Lambda functions, the IAM roles, the CodePipeline pipeline itself, and the CodeBuild projects. Updating the stack updates the entire pipeline atomically.

CDK for Higher-Level Constructs

AWS CDK provides higher-level constructs that compile to CloudFormation but with type-safe Python or TypeScript code. The aws-cdk.aws-sagemaker module wraps SageMaker resources; the aws-cdk.aws-codepipeline module wraps CodePipeline. CDK is preferred over raw CloudFormation when the pipeline structure is dynamic (different stages for different environments) or when shared constructs across teams reduce boilerplate.

SageMaker Projects MLOps Templates

SageMaker Projects provide pre-built MLOps templates that scaffold a complete CodePipeline + SageMaker Pipelines + Model Registry + endpoint deployment in minutes. Templates include "MLOps template for model building, training, and deployment" (single account), "MLOps template for model deployment" (multi-account), and custom templates pulled from a Service Catalog portfolio. These templates are the fastest path to a baseline ML CI/CD setup and are explicitly mentioned in the MLA-C01 exam guide.

Terraform for ML Infrastructure

Terraform via the AWS Terraform provider also supports SageMaker resources. Many enterprises standardize on Terraform across cloud providers; for those teams, Terraform modules wrapping SageMaker Pipelines and CodePipeline are the IaC layer. Functionally equivalent to CloudFormation for ML CI/CD purposes; the exam does not preference one over the other.

SageMaker Pipelines vs CodePipeline — Complementary Roles

This distinction is one of the most tested conceptual points on the MLA-C01 exam.

What SageMaker Pipelines Does Best

SageMaker Pipelines is purpose-built for ML workflows. It supports step types that CodePipeline cannot — ProcessingStep (run a SageMaker Processing job), TrainingStep (run a training job), TuningStep (run hyperparameter tuning), TransformStep (run batch transform), CreateModelStep, RegisterModelStep, ConditionStep (branch on metric thresholds), CallbackStep (wait for external system), Lambda Step, ClarifyCheckStep, QualityCheckStep, EMRStep, FailStep. Pipeline executions automatically capture lineage in SageMaker ML Lineage Tracking. Step caching avoids recomputing unchanged steps when reruns occur.

What CodePipeline Does Best

CodePipeline is purpose-built for software CI/CD. It supports source integrations (CodeCommit, GitHub, Bitbucket, S3, ECR), CodeBuild integration for arbitrary shell-based testing, CodeDeploy for application deployment, CloudFormation deployment for IaC, manual approval actions, third-party integrations (Jenkins, TeamCity, Datadog), and cross-region action support.

The Layered Architecture Pattern

In production: CodePipeline is the outer layer that wraps the entire CI/CD workflow including source pulls, code testing, infrastructure provisioning, manual approvals, and endpoint deployment. SageMaker Pipelines is the inner layer that executes within CodePipeline's Train stage, handling the ML-specific orchestration of preprocess-train-evaluate-register. Each layer is responsible for what it does best. This is the canonical answer pattern on the exam.

Do not propose CodePipeline alone as the orchestrator for an ML workflow that includes training, evaluation, and conditional registration. CodePipeline cannot natively express "if AUC > 0.85 then register else fail" — that ConditionStep belongs inside SageMaker Pipelines. Conversely, do not propose SageMaker Pipelines alone as the orchestrator for an ML workflow that includes pulling source from GitHub, running code-quality unit tests in CodeBuild, and deploying via CloudFormation — those steps belong in CodePipeline. The exam will plant stems with one-tool answers; the correct answer is almost always the layered architecture using both. SageMaker Projects MLOps templates implement this layered pattern out of the box.

The Canonical Retraining Loop — End to End

Memorize this loop. It is the single most-tested mechanism on MLA-C01 Domain 3.

Step 1 — Production Endpoint Drift Detection

A SageMaker real-time or asynchronous endpoint serves traffic with data capture enabled (writing request and response to S3). SageMaker Model Monitor runs scheduled monitoring jobs that compare live request distributions against a baseline computed from the training data. When statistical drift exceeds the constraint threshold (KS test p-value, KL divergence, missing value rate), Model Monitor emits a CloudWatch metric for feature_baseline_drift violations.

Step 2 — CloudWatch Alarm Triggers EventBridge

A CloudWatch alarm watches the violation metric and transitions to ALARM state when the threshold is breached. The alarm publishes to an SNS topic (for SOC notification) and emits a CloudWatch event consumed by EventBridge.

Step 3 — EventBridge Rule Triggers Lambda

An EventBridge rule matches the alarm state-change event and invokes a Lambda function. The Lambda queries the SageMaker Model Registry for the currently approved model package, fetches the latest training data S3 path from Parameter Store, and invokes the SageMaker Pipeline execution with parameters (TrainingDataUri, BaselineModelPackageArn).

Step 4 — SageMaker Pipeline Executes

The pipeline runs Processing (refresh features from latest data), Training (fit new model on extended dataset), Evaluation (compute metrics on hold-out set), ConditionStep (if AUC > baseline AUC + 0.02, proceed; else FailStep), RegisterModel (register with PendingManualApproval status), and emits a Model Package State Change event.

Step 5 — Approval and Deployment

The Model Package State Change event triggers either a manual approval workflow (analyst reviews, approves in console) or an auto-approval Lambda. On approval status change to Approved, an EventBridge rule triggers CodePipeline (via StartPipelineExecution) which deploys the new model to the production endpoint via blue/green deployment with CloudWatch alarms gating the rollout.

Step 6 — Post-Deployment Monitoring

The new endpoint inherits the Model Monitor schedule, with the baseline regenerated from the new training data. The loop is closed and ready to fire again when the next drift event occurs.

The canonical AWS retraining loop is: Endpoint → Model Monitor violation → CloudWatch alarm → EventBridge rule → Lambda → SageMaker Pipeline execution → Model Registry registration → approval (manual or automated) → CodePipeline deployment → updated endpoint with new baseline. Memorize this nine-step sequence. The MLA-C01 exam tests it through ordering questions ("place the steps in correct order") and matching questions ("match each AWS service to its role in the retraining loop"). Skipping any step or substituting an unrelated service (e.g., putting SageMaker Pipelines as the trigger source instead of as the executor) is the wrong answer pattern. The MLOps whitepaper diagrams this loop on page 15; auditors recognize it as the AWS-blessed reference architecture.

GitOps for ML — Version-Controlled Pipeline Definitions

GitOps treats every artifact in the ML pipeline — code, configs, pipeline definitions, infrastructure templates, feature store schemas — as version-controlled in Git, with deployment driven by Git state. For ML CI/CD, GitOps means the SageMaker Pipeline definition is a Python file in Git, the CodePipeline definition is a CloudFormation template in Git, the Feature Store schemas are JSON files in Git, and changes to any of these flow through pull-request review before being applied. The benefit: every model in production has a Git commit traceable as its source-of-truth provenance, and rolling back means reverting Git, not clicking in consoles.

Feature Store Schemas as Code

Feature Store feature group definitions stored in Git ensure that schema changes (adding a feature, changing a feature type) flow through code review and through CI tests against historical data before being applied. Schema drift between training and serving — the classic training-serving skew — is prevented when both training pipelines and inference pipelines reference the same Git-versioned feature group definition.

Pipeline Definition Versioning

Every commit to the pipeline definition file produces a new pipeline version. SageMaker Pipelines supports pipeline versioning natively — UpsertPipeline increments a version counter, and pipeline executions reference the version they ran against. This provides reproducibility: re-running a six-month-old execution uses the pipeline definition as it was, not as it is now.

Testing ML Models in CI/CD — Smoke, Shadow, Canary

Pre-production testing extends standard CI/CD testing patterns with ML-specific concerns.

Smoke Tests

A smoke test invokes the new endpoint with a small set of known-input/known-output pairs and verifies predictions match within tolerance. Run in the Build stage of CodePipeline before deployment. Catches container packaging bugs, missing model artifacts, schema mismatches.

Shadow Deployment

A shadow deployment routes duplicate live traffic to the new model alongside the production model but discards the new model's predictions. The new model's predictions are logged and compared offline against the production model's predictions and against ground truth. This catches regression bugs invisible to unit tests and to smoke tests because they only manifest on real production traffic.

Canary Deployment

A canary deployment shifts a small percentage (typically 5 percent) of live traffic to the new model and monitors error rate, latency, and prediction-distribution metrics. SageMaker Deployment Guardrails configured with canary traffic shifting and CloudWatch alarms automatically rolls back if any alarm fires during the canary window. After the canary window completes successfully, traffic shifts to 100 percent.

A/B Testing With Production Variants

Multiple model versions hosted simultaneously on one endpoint with traffic weights split between them. Used for online learning scenarios where statistical significance on business metrics (conversion rate, revenue per session) drives the choice between models.

Common Exam Traps for ML CI/CD on MLA-C01

Trap 1 — CodePipeline Replaces SageMaker Pipelines

Wrong. They are complementary. CodePipeline handles outer software-engineering CI/CD; SageMaker Pipelines handles inner ML workflow orchestration. Use both.

Trap 2 — Approval Gate Goes Before Training

Wrong. Approval evaluates trained model metrics. Place after evaluation, before deployment.

Trap 3 — Polling SageMaker APIs for State Changes

Wrong. SageMaker emits EventBridge events. Use rule-based triggers, not polling.

Trap 4 — Scheduled Retraining Is Always Better Than Drift-Driven

Wrong. Scheduled retraining wastes compute when data is stable and is too slow when drift accelerates. Drift-driven via Model Monitor is the recommended pattern; scheduled is a fallback when Monitor is not configured.

Trap 5 — Manual Approval Is Always Required

Wrong. Manual is required in regulated industries and for production. Lower environments (dev, staging) can use automated metric-threshold approval via Lambda.

Trap 6 — CodeBuild Runs the Full Training Job

Wrong. CodeBuild runs unit tests, container builds, integration tests with synthetic data. Full training (which requires GPU and hours) runs in SageMaker training jobs invoked from the SageMaker Pipeline inside the CodePipeline Train stage.

Trap 7 — One CodePipeline for All Environments

Wrong. The recommended pattern is one CodePipeline per environment (dev, staging, prod) sharing common CodeBuild projects and templates, with promotion between pipelines triggered by approval gates. Single-pipeline-multi-environment introduces dangerous cross-environment side-effects.

Trap 8 — Model Registry Status Auto-Deploys

Wrong. ModelApprovalStatus = Approved does not auto-deploy. Deployment requires either an EventBridge rule on the approval event triggering CodePipeline, or a manual stage transition.

FAQ — ML CI/CD Top Questions for MLA-C01

Q1 — How do CodePipeline and SageMaker Pipelines differ in role and when do I use each?

CodePipeline is the outer software-engineering CI/CD orchestrator with stages like Source, Build, Test, Approve, Deploy and integrations with CodeBuild, CodeDeploy, CloudFormation, and third-party tools. SageMaker Pipelines is the inner ML workflow orchestrator with step types like Processing, Training, Tuning, Transform, RegisterModel, and ConditionStep that CodePipeline cannot express natively. Use CodePipeline as the outer wrapper for source pull, code testing, infrastructure provisioning, manual approval, and deployment. Use SageMaker Pipelines inside CodePipeline's Train stage for ML-specific orchestration. The MLA-C01 exam consistently tests this distinction; one-tool answers are wrong, and the layered architecture using both is the right answer.

Q2 — How do I trigger automated retraining when data drift is detected in production?

The canonical pattern: SageMaker Model Monitor runs scheduled jobs that detect drift and emit CloudWatch metrics. A CloudWatch alarm transitions to ALARM state when violations exceed threshold. An EventBridge rule matches the alarm state-change event and invokes a Lambda function. The Lambda calls SageMaker StartPipelineExecution on the pre-defined retraining pipeline with parameters indicating the drift type. The pipeline retrains, evaluates, and registers a new model package, which then either triggers manual approval or proceeds to automated deployment via CodePipeline. This is a six-service chain (Monitor → CloudWatch → EventBridge → Lambda → SageMaker Pipelines → Model Registry → CodePipeline) and it is the most-tested workflow on Domain 3.

Q3 — Where should the manual approval gate go in a production ML CI/CD pipeline?

After model evaluation and before production deployment. Specifically, the gate should fire after the SageMaker Pipeline has registered a new model package in Model Registry with PendingManualApproval status, and the gate should display the evaluation metrics, model card, and lineage data to the approver. Approval moves the model package to Approved status, which triggers (via EventBridge rule) the CodePipeline deployment stage. Placing the gate before training defeats its purpose because the approver has no metrics to evaluate. Placing the gate after deployment is risk-creating because the model is already serving traffic by the time the approval is reviewed.

Use cross-account Model Registry sharing via Resource Access Manager (RAM). Create the model package group in a central registry account (typically the Security or Audit account, or a dedicated MLOps account). Share the model package group with consumer accounts (staging and production) via RAM. CodePipeline in each consumer account references the cross-account model package ARN when invoking SageMaker UpdateEndpoint. Alternatively, use S3 cross-account replication to copy model artifacts to consumer accounts and recreate the model package locally — but this loses the centralized lineage and is less recommended. Cross-account Model Registry plus RAM sharing is the AWS-recommended pattern.

Q5 — What testing should run in CodeBuild and what should run in SageMaker Pipelines?

CodeBuild runs fast, infrastructure-cheap tests: unit tests of Python functions, schema validation of training data with Great Expectations, Dockerfile linting, container vulnerability scanning, integration tests against synthetic data on small instances, smoke tests against staging endpoints. CodeBuild does NOT run full training (too slow, requires GPU) or full data quality checks (too memory-intensive). SageMaker Pipelines runs the heavy ML compute: SageMaker Processing jobs for full data quality checks, Training jobs for actual model fitting (potentially on GPU clusters), Tuning jobs for hyperparameter optimization, Clarify processing for fairness metrics, Model Monitor baseline computation. The division is "fast and cheap goes in CodeBuild; expensive ML compute goes in SageMaker Pipelines."

Q6 — How do I prevent runaway retraining (e.g., a flaky drift signal triggers training every 10 minutes)?

Three guardrails. First, the Lambda triggered by EventBridge implements minimum-interval debouncing — read the timestamp of the last training run from DynamoDB or Parameter Store, and refuse to start a new run if less than N hours have elapsed. Second, the SageMaker Pipeline ConditionStep evaluates whether the new model is meaningfully better than the current production model (e.g., AUC must improve by at least 0.02) and fails the pipeline if not, preventing endless registration of marginally-different models. Third, configure CloudWatch alarm hysteresis (Datapoints to Alarm = 3 of 5) so a single noisy data point does not fire the alarm. With these three guardrails, the retraining loop is robust against flaky drift signals.

Q7 — How do I make the entire ML CI/CD pipeline reproducible six months later for an audit?

Pin everything and store everything in Model Registry metadata. Pin the CodeBuild image tag, pin all Python dependencies in requirements.txt, pin the training container image tag in ECR, pin the source commit SHA in Git, pin the training data S3 version (S3 Versioning enabled on the training data bucket). Store all of these pins as custom metadata properties on the model package in Model Registry — TrainingDataS3VersionId, ContainerImageDigest, CodeCommitId, BuildSpecVersion, RequirementsHash. To regenerate: pull the model package, retrieve the metadata, check out the commit, restore the data version, rebuild the container with the digest, and rerun the SageMaker Pipeline with the recorded parameters. This is the audit-grade reproducibility standard the MLA-C01 exam expects you to know.