SageMaker Model Registry and Versioning - MLA-C01 ML Engineer Study Notes

SageMaker Model Registry and Versioning is the governance backbone of every production machine learning system. SageMaker Model Registry is the centralized catalog where every trained model artifact, every evaluation metric, every hyperparameter set, and every approval decision is recorded with versioning, lineage, and audit support. On the MLA-C01 exam, Model Registry threads through three task statements — 3.1 (deployment infrastructure selection), 3.2 (infrastructure scripting), and 3.3 (CI/CD orchestration) — which is unusual breadth for a single topic and signals how heavily the certification weights model governance. Community signals from Pluralsight specifically call out the complete retraining loop from trigger to registry to deploy as a case-study pattern that appears repeatedly in live exam stems.

This guide covers Model Registry end to end: model package groups as the organizational unit, model package versions as the registered artifacts, the ModelApprovalStatus workflow and its automated and manual variants, cross-account sharing patterns via Resource Access Manager and via S3 replication, ML Lineage Tracking integration, model cards for documentation, registry-driven CodePipeline deployment via EventBridge events, and the rollback and historical-version patterns. It closes with the canonical registry-centered retraining loop the exam tests through ordering and matching question types and the troubleshooting decision trees for common registry failures.

What Is SageMaker Model Registry and Why MLA-C01 Tests It Across Three Tasks

SageMaker Model Registry is a fully managed service that catalogs trained model artifacts as versioned, governed, immutable records. A registered model carries metadata including the training job ARN, the source training data S3 URI, the inference container image URI, the evaluation metrics, the lineage graph, and the approval status. Without Model Registry, ML teams resort to ad-hoc S3 paths and spreadsheets to track which model is in production — losing the ability to roll back, audit, or reproduce models. With Model Registry, every production model has a stable identifier (the model package ARN), a clear approval lineage, and a guaranteed-recoverable historical record.

The Three Task Statements Where Model Registry Appears

Task 3.1 (Select deployment infrastructure) tests Model Registry as the source for endpoint deployment — the model package ARN is the input to CreateModel and CreateEndpointConfig calls. Task 3.2 (Create and script infrastructure) tests Model Registry creation via CloudFormation, CDK, and Terraform plus the IAM permissions for cross-account sharing. Task 3.3 (CI/CD pipelines) tests Model Registry as the integration point between SageMaker Pipelines (which registers models) and CodePipeline (which consumes approval events to trigger deployment). Topics that span three task statements deserve more study time than topics anchored to one.

Why Versioning Is a Governance Requirement, Not a Convenience

Regulated industries (finance, healthcare, insurance) require that every model serving production inference be traceable to a specific training run with reproducible inputs. When a regulator asks "what model was making credit decisions on March 15," the answer must be a specific model package ARN with metadata showing the training data version, the algorithm, the hyperparameters, the evaluation report, and who approved deployment. Model Registry provides this audit trail by design — model package ARNs are immutable identifiers, approval status changes are logged, and lineage is automatically captured. Treating Model Registry as optional is a compliance failure waiting to happen.

Plain-Language Explanation: SageMaker Model Registry

Model Registry is the kind of topic where governance, versioning, and deployment all converge on one service. Three concrete analogies make the structure stick.

Analogy 1 — The Library With Cataloging, Approval, and Circulation

Imagine a research university library. Every book published by faculty (every trained model) goes through cataloging by the head librarian (Model Registry registration) before it can be lent to readers (deployed to production endpoints). The cataloging system organizes books by series (model package groups — "fraud detection family," "recommendation family," "image classification family"); each new edition of a book within a series gets a sequential edition number (model package version 1, 2, 3...). The library has a review committee (approval workflow) that reads each new edition before stamping it as approved for circulation; rejected editions stay on a private shelf for revision. The card catalog (registry metadata) records every detail about each book — author (training job), publisher (training container image), publication date, chapter list (hyperparameters), reviewer notes (evaluation metrics), and approval signature. When a reader requests "the latest fraud detection book" (production endpoint update), the circulation desk (CodePipeline) looks up the most recent approved edition and ships a copy. When a defect is discovered in the current edition (production drift), the librarian can pull the previous approved edition off the shelf instantly (rollback) without recreating it from scratch. The library does not throw away old editions — they remain in the archive with their cataloging intact, queryable for any historical research.

Analogy 2 — The Pharmacy With FDA Drug Approval and Versioning

Picture a pharmacy company manufacturing medications. Every new formulation (every trained model) goes through clinical trials (training and evaluation), then submits an application to the FDA (Model Registry registration with PendingManualApproval status). The FDA review board (compliance officer or automated metric gate) reviews trial results (evaluation metrics, model card, fairness analysis) and either approves the formulation for distribution (status → Approved) or rejects it with comments (status → Rejected). Approved formulations get a National Drug Code (model package ARN) that uniquely identifies the version forever. The pharmacy distributes by ARN, never by "the current best one" — because pharmacies in different regions might be filling prescriptions with different approved versions during a phased rollout. When pharmacovigilance (Model Monitor) detects an adverse event signal in a particular formulation in production, the company can recall that specific NDC (rollback to previous model package ARN) instantly while the new formulation is being developed. Cross-pharmacy distribution (cross-account deployment) requires a distribution agreement (RAM share or cross-account IAM trust) so the consumer pharmacy can fulfill prescriptions using the central catalog. Every formulation manufactured stays in the historical record even after newer versions supersede it — the FDA requires it, and your auditor will too.

Analogy 3 — The Music Streaming Service With Album Versions and Editorial Approval

Picture a music streaming service. Artists upload albums (data scientists train models). The music ingestion team (CI/CD pipeline) catalogs each upload into the master catalog (Model Registry) under the artist's discography page (model package group). Each upload gets a track listing, audio file URLs, lyric metadata, ISRC codes, and editorial review status (registry metadata fields). The editorial team (approval gate) reviews each new album for content policy compliance, audio quality, and metadata correctness before approving it for the global catalog (status → Approved). The streaming app (production endpoint) serves the latest approved version to listeners; when an artist releases a remastered edition, the new version becomes the default, but the original is preserved in the archive. If a copyright infringement is discovered (production model defect), the editorial team can immediately delist that specific version (rollback) and serve the previous approved one. Distribution to regional partners (cross-account deployment) operates through licensing agreements (RAM shares) so partner platforms reference the central catalog rather than maintaining their own duplicate cataloging system. The version history page in the artist's discography (deployment history) shows every version that has been live and when, which auditors and copyright lawyers love.

Model Package Groups — The Organizational Unit

Every model package belongs to exactly one model package group. The group is the family or lineage; the package is a specific version within that family.

What Goes In a Group

A model package group represents a logical model family — "fraud_detection_credit_card," "recommendation_homepage," "image_classification_product." Every retrained version of that family — same business problem, same input schema, similar architecture but possibly different hyperparameters or training data — registers as a new model package within the group. Groups should NOT mix unrelated models; do not put fraud detection and recommendation in the same group even if both happen to be trained by the same team.

Group Naming Conventions

Best practice: name groups with a stable business-domain identifier, not with a date or a model architecture. "fraud_detection_v1" is a bad group name because it embeds a version that will become misleading after the model is iterated. "fraud_detection_credit_card" is a good group name because it persists across architectures (you can retrain from XGBoost to Neural Net and stay in the same group).

Group Tags for Cost Allocation and Access Control

Tags on the model package group propagate to all model packages within it (with caveats around timing). Use tags for cost allocation (Project, CostCenter, Owner), access control (Environment=Production restricts who can update approval status), and discovery (BusinessDomain, ModelType). The MLA-C01 exam tests tag-based access control via IAM conditions referencing aws:ResourceTag/Environment.

CreateModelPackageGroup IAM Requirements

The IAM principal creating a model package group needs sagemaker:CreateModelPackageGroup. The principal updating its description or policy needs sagemaker:UpdateModelPackageGroup and sagemaker:PutModelPackageGroupPolicy. Forgetting the latter is a common cross-account sharing failure — the group exists, but the resource policy granting other accounts the right to register packages into it is missing.

A model package group is the durable organizational unit; a model package is a specific versioned artifact within a group. The exam will test this distinction in matching questions ("match each registry concept to its description"). The group survives across model retrainings — version 1 of a model is registered into the group at training time, version 2 supersedes it, version 3 supersedes that, but the group identifier remains stable for years. CodePipeline references the group ARN when looking up the latest approved version; CloudFormation references the group ARN when subscribing to model registration events. Treating model packages as the durable identifier (instead of the group) leads to brittle CI/CD pipelines that break every retraining cycle.

Model Package Versioning — The Versioned Artifact

A model package is what gets deployed. Memorize what it contains.

What Metadata a Model Package Captures

A registered model package contains the inference container image URI in ECR, the model artifact S3 URI (the trained weights), the inference specification (instance types, environment variables, response content type), the source training job ARN, the evaluation metrics (precision, recall, AUC, custom metrics), the data quality baseline, the approval status, the approval description, custom metadata key-value pairs, and (optionally) a model card reference. This is enough metadata to recreate the model from scratch, evaluate its quality, and deploy it.

Sequential Version Numbers Within a Group

Model package versions auto-increment sequentially within a group. Version numbers cannot be skipped or reused. If version 5 is rejected, the next registration is still version 6, not a re-registration of version 5. This is by design — model package version numbers are part of the audit trail and must not be malleable.

The model_package_arn Is Immutable

The model package ARN (e.g., arn:aws:sagemaker:us-east-1:111111111111:model-package/fraud_detection_credit_card/7) is the durable identifier used by CodePipeline, by EventBridge rules, by CloudFormation references, and by audit logs. Even after the model is rejected, deleted, or superseded, the ARN reference is recoverable from the registry's history.

CreateModelPackage From a Training Job

The typical pipeline pattern: a SageMaker training job completes, the SageMaker Pipeline RegisterModel step calls CreateModelPackage with the training job ARN, the evaluation S3 location, and the desired approval status (PendingManualApproval by default). The registry computes lineage automatically from the training job ARN. Manual registrations outside a pipeline lose this automatic lineage and must populate metadata explicitly.

Custom Metadata for Reproducibility

Beyond the standard metadata, custom properties (CustomerProperties dict) are critical for audit-grade reproducibility — TrainingDataS3VersionId (S3 versioning ID of the training data at training time), ContainerImageDigest (sha256 digest of the inference container), GitCommitSha (source code commit), RequirementsHash (hash of requirements.txt). Six months later, these custom properties are what allow regenerating the model.

ModelApprovalStatus — The Approval Workflow

Approval is the gatekeeper between trained model and production deployment.

The Three Approval States

Model packages have one of three ModelApprovalStatus values: PendingManualApproval (default at registration, awaiting review), Approved (cleared for production deployment), Rejected (failed review, will not deploy, but stays in registry as historical record). Status changes are logged with the actor identity and an optional approval description.

How To Update Approval Status

Three mechanisms: (1) the SageMaker console approval button — fastest for human reviewers but does not scale; (2) the UpdateModelPackage API call — used by Lambda functions implementing automated approval logic; (3) a SageMaker Pipelines RegisterModel step with model_approval_status="Approved" — used when the pipeline already gates approval via a ConditionStep evaluating metrics.

Manual Approval Workflow

Manual approval pattern: SageMaker Pipeline registers with PendingManualApproval, EventBridge rule on the registration event triggers an SNS notification to the ML lead with a link to the model card and evaluation report, the lead reviews and clicks Approve in console, status changes to Approved, EventBridge rule on the approval event triggers CodePipeline deployment. This loop typically takes hours to days depending on review SLAs.

Automated Approval via ConditionStep

For lower-risk environments, embed the approval inside the SageMaker Pipeline using a ConditionStep that compares evaluation metrics to thresholds. If metrics pass, the RegisterModel step sets status to Approved directly; if metrics fail, the FailStep halts the pipeline and registers nothing. This compresses the loop to minutes and removes humans from the approval critical path.

Multi-Tier Approval

Regulated workflows may require multiple approvals — ML engineer signs off on technical metrics, fairness officer signs off on bias analysis, compliance officer signs off on regulatory criteria. Implement via custom metadata properties (MLEngineerApproved=true, FairnessApproved=true, ComplianceApproved=true) plus a Lambda that flips ModelApprovalStatus to Approved only when all three properties are set. Each approver updates one property; the Lambda is the conjunction.

Setting ModelApprovalStatus to Approved does NOT auto-deploy the model to a production endpoint. Approval is a status flag in the registry; deployment is a separate API call (UpdateEndpoint or CloudFormation stack update). The MLA-C01 exam tests this distinction frequently — answers proposing that approval auto-deploys are wrong. The correct deployment trigger is an EventBridge rule on the Model Package State Change event with ModelApprovalStatus = Approved, targeting a Lambda function or CodePipeline that executes the deployment. This separation by design — approval and deployment have different security blast radii and should be authorized independently.

Production ML often spans multiple accounts — dev, staging, production. Model Registry supports cross-account sharing via several mechanisms.

The recommended pattern. AWS Resource Access Manager (RAM) shares the model package group with consumer accounts. Consumer accounts read model package metadata via the shared group ARN, list versions, and (if granted) update approval status. This pattern centralizes the registry in one account (typically a dedicated MLOps or Audit account) while letting consumer accounts operate against it.

Pattern 2 — Resource-Based Policy on Model Package Group

A model package group can have a resource-based policy granting specific principals (other accounts, IAM roles) actions like sagemaker:DescribeModelPackage, sagemaker:UpdateModelPackage, or sagemaker:CreateModel. Use when RAM sharing does not fit (e.g., third-party partner accounts outside the AWS Organization).

Pattern 3 — S3 Cross-Account Replication of Model Artifacts

For air-gapped or strictly-isolated consumer accounts, replicate the model artifact S3 objects to the consumer account's bucket and re-register a new model package locally referencing the replicated artifact. Loses centralized lineage but provides full account isolation. Used in highly regulated multi-tenant scenarios.

Cross-Account Deployment Mechanics

When CodePipeline in a production account deploys a model package owned by an MLOps account: CodePipeline assumes a cross-account role in the production account, the role calls CreateModel with the cross-account model package ARN, SageMaker validates that the production account has been granted access (via RAM or resource policy), the model is created in the production account referencing the central registry's metadata, and the endpoint is deployed normally. The artifact S3 location must also be readable by the production account — either via cross-account bucket policy or by replicating the artifact.

IAM Permissions for Cross-Account Registry

The minimum IAM action set: sagemaker:DescribeModelPackage (read metadata), sagemaker:CreateModel (instantiate the model in the consumer account), sagemaker:CreateEndpointConfig and sagemaker:CreateEndpoint (deploy), and either S3 read on the model artifact location or KMS decrypt on the artifact's encryption key. Forgetting the KMS permission is a frequent cross-account deployment failure — the artifact appears accessible by S3 path but decryption fails silently.

Cross-account model deployment requires both RAM sharing of the model package group AND cross-account access to the model artifact S3 location AND (if encrypted) the KMS key. A common production failure: the RAM share is configured, the IAM principal in the consumer account can describe the model package, but the S3 bucket policy or KMS key policy still denies the consumer account, so CreateModel succeeds but the actual endpoint launch fails when SageMaker tries to fetch the artifact. The MLA-C01 exam plants stems where one of these three layers (RAM, S3, KMS) is missing and asks for the failure cause; the answer is always the layer not granted access. Configure all three when setting up cross-account registry sharing.

SageMaker ML Lineage Tracking — Provenance for Free

ML Lineage Tracking is a sibling service that automatically captures artifact provenance for SageMaker resources.

What Gets Tracked Automatically

When a SageMaker Pipeline runs, lineage is captured automatically: the training data S3 path is recorded as an Artifact, the training job is recorded as an Action, the trained model is another Artifact, the model package registration is another Action, and the relationships (training data → training job → model artifact → model package) form a graph. Querying the lineage of a model package returns the full upstream chain back to the source training data.

Why Lineage Matters for Audit and Reproducibility

When a regulator asks "what data trained the model that made decision X on date Y," the answer comes from lineage — query the model package ARN, walk upstream to the training job, walk upstream to the training data S3 path, read the S3 versioning ID. Without lineage tracking, this chain must be reconstructed manually from disparate logs, often impossibly. With lineage, it is a single GraphQL-like query.

Lineage and Custom Artifacts

Custom artifacts (feature stores, model cards, evaluation reports) can be added to the lineage graph via the AssociateTrialComponent and CreateArtifact APIs. Best practice: add the feature group ARN and the evaluation report S3 URI as lineage artifacts attached to the training job, so audit queries can reach them.

Lineage Query Patterns

The Visualizer component in SageMaker Studio displays lineage graphs visually. Programmatic access via the LineageQuery API supports forward (downstream) and backward (upstream) traversal with depth limits. Common queries: "all model packages downstream of training data version X" (impact analysis when a data quality issue is discovered) and "all training data upstream of model package Y" (audit response).

Model Cards — Documentation as a First-Class Artifact

Model cards capture intended use, evaluation results, and limitations as structured documentation.

Why Model Cards Matter for Governance

A model card is a standardized JSON document describing the model: intended use cases, out-of-scope use cases, evaluation methodology, evaluation results across slices, fairness analysis, training data sources, license restrictions, and contact information. Model cards are required artifacts for responsible AI compliance frameworks (EU AI Act, ISO 42001) and increasingly demanded by regulators.

Attaching a Model Card to a Model Package

The CreateModelCard API creates a model card; the AdditionalInferenceSpecifications and CustomerMetadataProperties on the model package can reference the card ARN. Best practice: every model package in production has an attached model card stored alongside it in registry metadata.

Model Card Lifecycle

Cards have status: Draft (work in progress), PendingReview (awaiting review), Approved (published), Archived (superseded). The card lifecycle is independent of the model package approval but typically tracks together — approving a card and approving the associated model package happen in the same review cycle.

Auto-Generated Card Sections

SageMaker can auto-populate parts of the model card from the registry metadata — evaluation metrics, training data sources, training job ARN, hyperparameters. The human-written sections (intended use, out-of-scope, ethical considerations) require manual content. Treat the model card as a deliverable from the data scientist + product manager, not as auto-generated boilerplate.

Make model cards mandatory in your CI/CD pipeline by gating the production deployment stage on model card status. A simple Lambda function in CodePipeline can read the model package, extract the model card ARN from CustomerMetadataProperties, fetch the card, and verify it has Status=Approved before allowing deployment. If the card is missing or not approved, the pipeline fails. This makes model card creation a non-negotiable part of the ML release process and aligns with EU AI Act and ISO 42001 governance requirements that the MLA-C01 exam touches on through Domain 4 security and responsible AI questions.

Registry-Driven Deployment via EventBridge

The cleanest production pattern uses EventBridge to decouple registry changes from deployment.

The Model Package State Change Event

Whenever a model package's approval status changes, SageMaker emits a Model Package State Change event to EventBridge with the model package ARN, the previous status, the new status, and the timestamp. This event is the natural integration point for downstream automation.

EventBridge Rule for Approval-Triggered Deployment

The standard rule pattern matches source = aws.sagemaker, detail-type = SageMaker Model Package State Change, detail.ModelApprovalStatus = Approved and targets a CodePipeline pipeline (via aws.codepipeline.startPipelineExecution) or a Lambda function that calls UpdateEndpoint directly. The rule pattern can additionally filter by ModelPackageGroupName so different groups trigger different pipelines.

Decoupling Approval From Deployment

This pattern decouples approval (a registry concern) from deployment (an operational concern). The approver does not need to know which CodePipeline pipeline deploys the model; they just approve in the registry. The pipeline operator does not need to monitor the registry; the EventBridge rule fires the deployment automatically. This is the AWS-blessed integration pattern and is what SageMaker Projects MLOps templates implement.

Multi-Environment Routing

Different environments (staging, production) can subscribe to different approval-status transitions. Staging subscribes to PendingManualApproval (auto-deploy candidate models for staging tests). Production subscribes to Approved (deploy only after human approval). This routing is implemented via separate EventBridge rules with different filter patterns, all reading from the same model package state change event stream.

Rollback Patterns Using Model Registry

Rollback is one of the highest-value capabilities the registry provides.

Rollback to Previous Approved Version

When a production deployment causes degraded performance (Model Monitor detects drift, latency exceeds SLA, error rate spikes), the ML engineer queries the registry for the previously approved model package, calls UpdateEndpoint with the previous model package's CreateModel-derived endpoint config, and traffic shifts back. Total recovery time: minutes if scripted, sub-minute if pre-staged via blue/green endpoint configurations.

Rollback Automation via CloudWatch Alarm

Production endpoints with deployment guardrails configured can rollback automatically when a CloudWatch alarm fires during canary or linear deployment windows. The guardrail reads the previous endpoint config from the deployment history and switches back without operator action.

Why Rollback Beats Recompute

The alternative to registry-based rollback is retraining the previous model from scratch, which takes hours and depends on the training data still being available in the same form. Registry-based rollback uses the immutable model package and finishes in minutes. This is one of the strongest arguments for treating Model Registry as mandatory infrastructure.

Rollback Considerations for Schema Changes

If the inference request schema changed between the old and new versions, rolling back to the old version may break upstream callers expecting the new schema. Mitigations: keep schema changes backward-compatible during deployment, or include the schema version in the request payload so the model can reject unsupported schemas gracefully. The exam tests schema-incompatibility scenarios in stems referring to "rolled back endpoint returning 500 errors."

Common Exam Traps for Model Registry on MLA-C01

Trap 1 — Approval Auto-Deploys to Production

Wrong. Approval changes a status field; deployment is a separate API call typically triggered by an EventBridge rule on the approval event.

Trap 2 — Model Package Equals SageMaker Model

Wrong. A model package is a registry entry; a SageMaker Model is the deployable unit created via CreateModel that points at the model package. They are separate resources with separate ARNs.

Trap 3 — Model Package Group Holds Multiple Unrelated Models

Wrong. A group is one model family. Mixing fraud detection and recommendation in the same group breaks lineage queries and approval workflows.

Wrong. Cross-account deployment also needs S3 cross-account access to the artifact (and KMS access if encrypted) plus IAM permissions for SageMaker actions in the consumer account. RAM is necessary but not sufficient.

Trap 5 — Deleted Model Packages Disappear From Audit

Wrong. Deleted model packages are removed from the registry's active listings but their existence and approval history are recorded in CloudTrail and in lineage indefinitely. The audit trail survives.

Trap 6 — Version Numbers Reset After Group Recreation

Trap stem-pattern. If a model package group is deleted and recreated with the same name, version numbers restart at 1. The implication: never delete a group with active production references; archive instead.

Trap 7 — Lineage Captures Only Pipeline-Driven Models

Wrong. Lineage is captured for any SageMaker training job, not just pipeline-driven ones. Manual training jobs and notebook-driven training also produce lineage records. Pipeline-driven training adds richer lineage with explicit step relationships.

Trap 8 — Model Cards Are Auto-Required for Registration

Wrong. Model cards are independent artifacts that can be associated with model packages but are not required for registration. Best practice gates production deployment on card existence; baseline registration does not require it.

Key Numbers and Must-Memorize Model Registry Facts

Registry Concepts

Model package group is the family / lineage container
Model package is the versioned artifact within a group
Versions auto-increment sequentially within a group, no skip, no reuse
Three approval states: PendingManualApproval, Approved, Rejected
Rejected packages stay in the registry as historical record

Approval Mechanisms

Console manual approval (humans, slow)
UpdateModelPackage API (Lambda automation)
RegisterModel step with model_approval_status (pipeline-internal automation)
ConditionStep gates automated approval inside SageMaker Pipelines

RAM share of model package group is the recommended pattern
Resource-based policy on the group is the alternative for non-Organization accounts
S3 replication is used for air-gapped consumers
All three layers (RAM/policy + S3 + KMS) must be granted

Integration Points

EventBridge Model Package State Change event for approval-driven deployment
CodePipeline reads approved model package ARN for deployment
ML Lineage Tracking captures upstream provenance automatically
Model cards attach via CustomerMetadataProperties for governance

Reproducibility Custom Properties

TrainingDataS3VersionId
ContainerImageDigest
GitCommitSha
RequirementsHash
Custom approval-tier flags (MLEngineerApproved, FairnessApproved, ComplianceApproved)

The complete registry-centered retraining loop is: Model Monitor drift → CloudWatch alarm → EventBridge → Lambda → SageMaker Pipeline → RegisterModel step (PendingManualApproval) → human or automated approval → UpdateModelPackage to Approved → EventBridge Model Package State Change event → CodePipeline → CreateModel → CreateEndpointConfig → UpdateEndpoint with deployment guardrails. Memorize this thirteen-step sequence. The MLA-C01 exam tests it through ordering questions and matching questions. Skipping the registry step (going directly from training to deployment) is the wrong answer pattern; substituting EventBridge with polling is the wrong answer pattern; conflating approval with deployment is the wrong answer pattern. Pluralsight specifically calls out this complete retraining loop as a case-study pattern that appears repeatedly in live MLA-C01 stems.

FAQ — SageMaker Model Registry Top Questions

Q1 — What is the difference between a model package, a model package group, and a SageMaker Model?

A model package group is the durable family identifier ("fraud_detection_credit_card") that persists across retrainings. A model package is a specific versioned artifact within that group ("fraud_detection_credit_card/7") containing the trained weights, container image, evaluation metrics, and approval status. A SageMaker Model is the deployable runtime instantiation created via CreateModel from a model package — it is what gets attached to an endpoint config. The exam tests this in matching questions; mixing up the three is a common error. Group is the family, package is the version, model is the deployable instance. CodePipeline references the package ARN; endpoints reference the model name; the registry organizes packages by group.

Three layers must align. First, share the model package group with the production account via AWS Resource Access Manager (RAM) — the production account can now describe and read packages in the group. Second, grant the production account read access to the S3 bucket holding the model artifact via bucket policy — S3 is account-scoped and the registry metadata pointing at the artifact does not auto-grant cross-account read. Third, if the artifact is encrypted with a customer-managed KMS key, grant the production account decrypt permission on the key via the key policy. With all three layers in place, CodePipeline in the production account can call CreateModel with the cross-account model package ARN, and the resulting model can be deployed to a local endpoint. Missing any one of the three layers produces a CreateModel-succeeds-but-endpoint-launch-fails symptom that is heavily tested on the exam.

Q3 — When should I use automated approval via ConditionStep versus manual approval in CodePipeline?

Use automated approval (ConditionStep evaluating metrics inside the SageMaker Pipeline) when the deployment risk is low and metrics fully capture model quality — typical for staging deployments, internal-only models, and low-impact features. The benefit: deployment loops complete in minutes without human intervention. Use manual approval (CodePipeline approval action with SNS notification to a reviewer) when the deployment risk is high — production deployments in regulated industries, models with significant business impact, models where evaluation metrics do not fully capture quality (subjective assessments needed), or any first-time deployment of a new model family. Many teams combine both: automated approval gates promotion to staging, manual approval gates promotion to production. The MLA-C01 exam tests the trade-off between speed (automated) and risk control (manual).

Q4 — How do I implement rollback to a previous model version when production drift is detected?

The fastest rollback path: query the model package group for the previously approved model package, call CreateModel with that package ARN, call CreateEndpointConfig with the new SageMaker Model, call UpdateEndpoint with the new endpoint config. SageMaker performs a blue/green swap and traffic shifts to the previous model in seconds. Pre-stage this for fastest recovery: keep two endpoint configs (current and previous) ready, so UpdateEndpoint just toggles between them. Even faster: configure SageMaker Deployment Guardrails on the production endpoint with CloudWatch alarms that automatically rollback on alarm breach during the deployment window. Rollback via registry beats retraining-from-scratch (which takes hours and depends on training data still being available in the same form) by orders of magnitude — this is one of the strongest arguments for treating Model Registry as mandatory production infrastructure.

Q5 — How do I prove for an audit which exact model was serving production traffic on a given date six months ago?

Combine three sources. First, CloudTrail logs of UpdateEndpoint calls show every endpoint config change with timestamp and the new endpoint config ARN; reverse-engineer to find the config in effect on the audit date. Second, the endpoint config references a SageMaker Model, which references a model package ARN — query the registry for that package's metadata. Third, ML Lineage Tracking on the package returns the training job ARN, the training data S3 path with version ID, the source code commit SHA (from custom metadata), and the container image digest. With all three combined, the audit answer is "model package fraud_detection_credit_card/7, trained from S3 versionId X, code commit Y, container image digest Z, approved by ML Engineer Lead on date W." This audit-grade reproducibility is exactly what regulators expect and what the registry provides for free if you populate the custom metadata properties at registration time.

Q6 — Can I delete a model package group that is no longer used? What happens to historical references?

Technically yes — DeleteModelPackageGroup removes the group and all its packages from the registry's active listings. Practically you should not. Deleting the group invalidates every CodePipeline definition referencing it, breaks lineage queries that walk through it, and requires audit reconstruction from CloudTrail alone. The recommended practice is to archive (move all packages to ApprovalStatus=Rejected with a "deprecated" description) rather than delete. If you must delete, document the deletion in your audit trail with the reason, the date, and the deciding authority, and confirm in advance that no production endpoint, no CodePipeline, and no monitoring system references the group. The MLA-C01 exam treats deletion as a destructive operation and prefers archival in scenario answers.

Q7 — How does Model Registry interact with SageMaker Projects MLOps templates?

SageMaker Projects MLOps templates scaffold a complete CI/CD pipeline including Model Registry integration out of the box. The "MLOps template for model building, training, and deployment" creates a model package group named after the project, configures the SageMaker Pipeline to register every successful training run with PendingManualApproval status, configures an EventBridge rule on the approval event, and configures a CodePipeline that reads the approved model package and deploys to a staging then production endpoint. The "MLOps template for model deployment" handles the multi-account variant — registry in one account, deployment in another, with cross-account RAM sharing pre-configured. For teams new to ML CI/CD, starting from a SageMaker Project template is the fastest path to a production-grade registry-driven pipeline; the alternative is wiring all the moving parts manually, which is fragile and time-consuming. The MLA-C01 exam expects familiarity with these templates and tests them in scenario stems.