examlab .net The most efficient path to the most valuable certifications.
In this note ≈ 25 min

Artifact Management — CodeArtifact, ECR, and EC2 Image Builder

5,000 words · ≈ 25 min read ·

DOP-C02 deep dive on artifact management: CodeArtifact for npm/Maven/PyPI/NuGet packages, ECR for container images, S3 for build artifacts, lifecycle policies, cross-account access, EC2 Image Builder for AMIs, and signing with AWS Signer.

Do 20 practice questions → Free · No signup · DOP-C02

Artifact management is the unglamorous backbone of every CI/CD pipeline. DOP-C02 tests it under task statement 1.3 ("Build and manage artifacts"), and the questions are rarely shallow — they probe cross-account ECR access, CodeArtifact upstream repository chaining, S3 lifecycle policies for old build artifacts, EC2 Image Builder for AMI pipelines, ECR image scanning configuration, and AWS Signer for container image signing. Each artifact type (packages, containers, binaries, AMIs) has its own AWS service with its own permission model and its own lifecycle controls.

This guide unifies the artifact-management story across CodeArtifact (npm, Maven, PyPI, NuGet), ECR (Docker and OCI images), S3 (everything else), and EC2 Image Builder (AMIs and container images). It covers the cross-account patterns DOP-C02 loves to test, lifecycle policies for cost control, image scanning for compliance, replication for multi-region resilience, and signing for supply-chain integrity. By the end you should be able to look at any artifact-management question and pick the right service plus the right lifecycle and access control without confusion.

Why Artifact Management Spans Half of SDLC Automation

Artifacts are the durable outputs of the build process — the things that flow from build to deploy. Where you store them, how you version them, who can read them, and when you delete them are decisions that recur in every pipeline. The exam treats artifact management as a Pro-tier topic because the wrong service choice (e.g., S3 for npm packages instead of CodeArtifact) creates operational debt, security gaps, and cost spirals.

Three forces shape the artifact-management decision space. First, artifact type: packages (npm/Maven/PyPI/NuGet) belong in CodeArtifact, container images in ECR, AMIs come from Image Builder, everything else in S3. Mixing categories causes friction. Second, access pattern: developers pull from artifact stores on every build, often cross-account; the access model must scale to thousands of pulls per day with sub-second latency. Third, lifecycle: artifacts accumulate quickly (every commit, every build, every release tag); without lifecycle policies, S3 and ECR bills spiral and security exposure widens.

  • CodeArtifact: AWS-managed package registry supporting npm, Maven, PyPI, NuGet, and generic packages.
  • CodeArtifact domain: a logical container for repositories, providing a single KMS key and consolidated billing across repositories.
  • CodeArtifact repository: a per-team or per-purpose package store inside a domain; can chain to upstream repositories.
  • CodeArtifact upstream repository: a repository that another repository fetches from on cache miss; supports public connections to npmjs, Maven Central, PyPI.
  • ECR private registry: per-account container image registry with per-repository policies.
  • ECR public registry (public.ecr.aws): AWS's free public registry for distributing open-source images.
  • ECR lifecycle policy: a JSON rule set that auto-deletes images by tag pattern, age, or count.
  • ECR replication: automatic replication of images across regions and accounts.
  • EC2 Image Builder: managed pipeline service for AMI and container image creation, including patching, hardening, and tag-based lifecycle.
  • AWS Signer: code-signing service for Lambda deployment packages and OCI containers.
  • Reference: https://docs.aws.amazon.com/codeartifact/latest/ug/welcome.html

Plain-Language Explanation: Artifact Management

Artifact management is mundane in concept but easy to misuse. Three analogies from different domains make the lifecycle and access model concrete.

Analogy 1: Library Cataloguing System

Picture a large research library managing millions of books, journals, and microfilms. CodeArtifact is the periodicals room — every issue of every journal (every version of every npm package) catalogued, with citation chains so a paper that references an older volume can still locate it (versioned package resolution). The domain is the library building — it owns the climate control (KMS key) and the cataloguing system. Repositories are the shelves within the building, organised by team or topic. Upstream repositories are the inter-library loan agreement — when a researcher requests a book the local shelf does not have, the librarian fetches it from the central system (Maven Central, npmjs) and caches a copy on the shelf for the next reader.

ECR is the special collections vault — large, version-controlled artifacts (rare manuscripts, container images) requiring access control and tamper-evidence. S3 is the general storage warehouse — anything that does not fit the cataloguing rules of the periodicals room or the special collections vault.

The lifecycle policy is the deaccessioning policy — books and microfilms are weeded periodically (untagged images dropped after 30 days, old versions pruned to keep the latest 50). Without a deaccessioning policy, the library runs out of shelf space and its catalogue becomes unsearchable.

Analogy 2: Restaurant Pantry and Walk-in Fridge

A restaurant kitchen manages perishables across several storage zones. The walk-in fridge is CodeArtifact — every ingredient (package version) labelled with a date, a supplier (upstream), and a use-by tag. The freezer is ECR — long-term storage for prepped components (container images), defrostable on demand. The dry-goods pantry is S3 — bulk storage for stable, non-perishable items (build artifacts).

Cross-account access is the commissary kitchen partnership: a sister restaurant has its own pantry, but on a busy night the head chef has signed permission slips to grab specific items from the partner's walk-in. The resource policy is the partnership agreement posted on the partner's fridge door, listing exactly which items the borrower can take.

Image scanning is the food-safety inspection — every container image is scanned for vulnerabilities before it is allowed to be pulled, just as every shipment is checked for spoilage before it goes into the fridge.

Analogy 3: Manufacturing Parts Warehouse

An aerospace manufacturer maintains a parts warehouse with strict provenance and traceability. CodeArtifact is the certified-parts shelf — every part number tagged with supplier, batch, and inspection certificate (npm package version, registry origin, integrity hash). ECR is the assembled-subcomponents stockroom — engines, avionics modules, container images — each with serial numbers and certified by inspection. EC2 Image Builder is the subassembly fabrication line — takes raw materials (base AMIs), runs them through machining and certification (build components, test components), and outputs a completed subassembly ready for installation.

AWS Signer is the inspector's stamp — every certified part receives a tamper-evident seal at the inspection station; downstream operations refuse to use parts without the seal.

The library analogy is the most useful for understanding versioned package resolution and upstream chains. The kitchen analogy maps cleanest to cross-account access and resource policies. The aerospace analogy is the right model when the exam emphasises supply-chain integrity, signing, and image scanning. Reference: https://docs.aws.amazon.com/codeartifact/latest/ug/welcome.html

CodeArtifact for Package Registry

CodeArtifact is AWS's managed package registry, supporting npm, Maven (Java), PyPI (Python), NuGet (.NET), generic packages, and Swift packages. The two-level hierarchy — domain → repository — separates concerns: the domain owns billing and KMS encryption, repositories own per-team access control.

The most important CodeArtifact concept is upstream repositories. A team-specific repository (team-frontend) declares an upstream pointing at a shared organisational repository (shared-vendor-mirror), which in turn declares an upstream to public connections (npmjs, pypi, maven-central). When the team requests a package, CodeArtifact walks the upstream chain on cache miss, fetches from the public connection, caches at every level, and serves the response. Subsequent requests across the org hit the cache.

This pattern delivers two wins: (1) dependency consolidation — every team's npm install warms a single org-wide cache, and (2) vulnerability containment — a malicious package on npmjs cannot poison the org until it is fetched once; org-level approval gates can block specific package names before fetch.

Authentication uses short-lived (12 hours) tokens via aws codeartifact get-authorization-token. CodeBuild integrates by running this command in pre_build and exporting the token to the language tooling (npm config set //... -auth ${TOKEN}).

The upstream-chain pattern means developers configure their tooling against a single CodeArtifact endpoint and never directly hit npmjs or PyPI. This enables (1) caching to reduce external bandwidth, (2) availability — public registry outages don't affect builds with cached packages, and (3) policy enforcement — block known-bad package names at the org-level repository before they are ever fetched. Reference: https://docs.aws.amazon.com/codeartifact/latest/ug/repos-upstream.html

ECR for Container Images

ECR is AWS's container image registry. Every account has a private registry per region (<account>.dkr.ecr.<region>.amazonaws.com); repositories within the registry hold individual image streams.

Three exam-relevant features:

Lifecycle policies auto-delete images by tag pattern, count, or age. A standard policy: "keep last 10 images tagged prod-*, delete untagged images after 7 days, delete dev-* images after 30 days". Without lifecycle, ECR bills accumulate at $0.10/GB-month per image.

Image scanning runs on push (basic scan, free) or continuously (enhanced scan via Amazon Inspector, paid). Findings emit EventBridge events, integrating with Security Hub. Scanning is required by most compliance frameworks; the exam treats unscanned images as a security flaw.

Cross-account and replication lets one team's build push to a central registry, with images replicated to consumer-team accounts and to other regions for DR. Replication uses ECR private registry replication rules (account + region pairs); the source registry's settings drive replication.

ECR repository policies (resource-based) permit cross-account pulls; combine with the consuming role's IAM policy for full access.

S3 for Build Artifacts

For everything that does not fit CodeArtifact or ECR — zip bundles, source archives, CloudFormation templates, build logs, signed binaries — S3 is the catch-all. CodePipeline's artifact store is an S3 bucket. CodeBuild output artifacts land in S3.

Critical S3 patterns for artifact management:

Versioning: enable bucket versioning so artifacts can be retrieved by version ID even after overwrite or delete. Required for any compliance audit trail.

Lifecycle policies: transition old artifacts to S3-IA or S3-Glacier-Instant-Retrieval after 30 days, delete after 365 days. The cost savings on a high-frequency pipeline are substantial.

KMS encryption with customer-managed key, plus a deny-on-unencrypted-PutObject bucket policy.

Object Lock in compliance mode for artifacts that must be immutable for regulatory retention (e.g., signed release binaries). Once locked, the object cannot be deleted by anyone — including root — until the retention period expires.

S3 Object Lock has two modes: governance (admins can override) and compliance (no override possible, even by root, until retention expires). For audit-grade artifact retention, compliance mode is correct but unforgiving — accidentally locking a 7-year retention on the wrong object means the bucket cannot delete it for 7 years. Test in governance mode first. Reference: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock-overview.html

EC2 Image Builder for AMIs and Container Images

EC2 Image Builder produces hardened AMIs and container images on a schedule, automating what was previously a manual or Packer-based process. A pipeline has three components:

Recipe: declares the base image and the components to apply (install packages, harden settings, run validation tests).

Components: reusable units of installation/configuration logic, declared in YAML. AWS provides standard components (Amazon Inspector, CloudWatch agent, kernel hardening); custom components run shell or PowerShell.

Distribution configuration: targets accounts and regions for the resulting AMI/container image, plus tags and KMS encryption.

The pipeline runs on a schedule (e.g., monthly on the 1st), pulls the latest base AMI, applies components, runs validation, and distributes the new image. Outputs flow into Auto Scaling launch templates referenced by ssm:/aws/service/... parameters or by tag-based queries.

For DOP-C02, knowing that Image Builder is the AWS-blessed alternative to Packer + CodePipeline orchestration is the key insight. The exam pattern: "the company runs golden AMI patching monthly with manual scripts; recommend an automated AWS-managed approach" — the answer is EC2 Image Builder.

AWS Signer for Code Signing

AWS Signer signs Lambda deployment packages and OCI containers, producing tamper-evident artifacts.

For Lambda, configure a code-signing config on the function: only signed packages from listed signing profiles are accepted on update. Unsigned or wrong-signed packages are rejected at deploy time, preventing unauthorised code from reaching production.

For containers, Signer integrates with Notation and AWS Signer's container-signing capability. ECR pulls validate the signature before allowing the image into a deployment.

The exam treats Signer as the answer when stems mention "verify integrity of the deployment package", "supply-chain attack mitigation", or "regulatory requirement to sign artifacts".

Lambda's code-signing config rejects unsigned packages on UpdateFunctionCode. Once a version is published, it remains valid even if the signing profile is later revoked — published versions are immutable. To revoke an in-production version, you must publish a new (signed) version and update the alias. The exam tests stems where "we revoked the signing profile but the bad version is still running"; the answer is "publish a new version and update the alias", not "revoke fixes the running version". Reference: https://docs.aws.amazon.com/signer/latest/developerguide/Welcome.html

Cross-Account Access Patterns

The DOP-C02-favourite cross-account pattern: a central artifact account hosts CodeArtifact and ECR; per-team or per-environment accounts pull artifacts.

For CodeArtifact, the domain policy plus repository policy pair grants principals in other accounts read access. The consuming account's IAM principal needs codeartifact:GetAuthorizationToken, codeartifact:ReadFromRepository, plus sts:GetServiceBearerToken. Configure once per cross-account boundary.

For ECR, the repository policy is sufficient for cross-account pulls. The consumer's IAM role needs ecr:GetAuthorizationToken (registry-wide, account-level), ecr:BatchCheckLayerAvailability, ecr:GetDownloadUrlForLayer, and ecr:BatchGetImage on the specific repository ARN.

For S3 artifact buckets shared cross-account, both bucket policy and KMS key policy must grant access; both are required when the bucket is encrypted with a customer-managed CMK.

Common Trap Patterns

Trap one: storing npm packages in S3 because "we already have S3". Lacks dependency resolution, integrity hashing, and rotational tooling integration. CodeArtifact is purpose-built.

Trap two: missing ECR lifecycle policy. Untagged images accumulate forever; bill spirals with no functional value.

Trap three: assuming ECR cross-account requires only repository policy. The consumer also needs IAM permissions on its own role; both sides agree, just like S3.

Trap four: enabling ECR scan on push but not subscribing to findings. Findings without alerting are noise; integrate with Security Hub or EventBridge.

Trap five: using Object Lock compliance mode without testing in governance mode first. Compliance retention is irrevocable.

Cross-account ECR pulls require both the source-account repository policy (allowing the consumer principal) and the consumer-account IAM policy (allowing ecr:GetDownloadUrlForLayer and ecr:BatchGetImage on the source ARN). Many candidates assume the resource policy is sufficient because S3 cross-account works that way; ECR follows the IAM-plus-resource-policy pattern. Reference: https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-policies.html

End-to-End Artifact Architecture

A canonical DOP-C02 multi-account artifact architecture assembles like this. Tooling account hosts: a CodeArtifact domain with a vendor-mirror repository (upstream to npmjs/pypi/maven-central) and per-team repositories with vendor-mirror as upstream; an ECR registry with replication to consumer accounts and to a DR region; a CodePipeline artifact bucket with KMS encryption, versioning, and 90-day lifecycle to S3-IA. Image Builder pipeline runs monthly producing hardened AMIs, distributed to consumer accounts via cross-account distribution. Signer profiles sign Lambda packages and container images at build time.

This architecture cleanly separates artifact production (tooling account) from consumption (workload accounts), supports DR via replication, and enforces supply-chain integrity via signing.

For any artifact-management question, anchor on:

  1. Service by type: CodeArtifact for packages, ECR for containers, Image Builder for AMIs, S3 for everything else.
  2. Lifecycle policy: every artifact store needs explicit retention rules to prevent unbounded growth.
  3. Cross-account access: resource policy + consumer IAM policy in tandem; KMS key policy if customer-managed encryption.
  4. Supply-chain integrity: image scanning (ECR), code signing (Signer), Object Lock (S3) — each enables a different compliance control.

Any artifact question maps to one of these four. Reference: https://docs.aws.amazon.com/codeartifact/latest/ug/welcome.html

常考陷阱(Common Exam Traps)

  1. Storing npm packages in S3 — lacks integrity hashing, dependency resolution, and tooling integration; CodeArtifact is the correct answer for any "npm/Maven/PyPI registry" stem.
  2. No ECR lifecycle policy — untagged images accumulate forever; lifecycle rules deleting untagged > 7 days and capping tag prefix dev are baseline hygiene.
  3. ECR cross-account assumes resource policy alone — IAM role on consumer side must explicitly allow ecr:BatchGetImage on the source ARN; both sides required.
  4. Object Lock compliance mode without testing — irrevocable; even root cannot delete locked objects until retention expires. Use governance mode first to validate the policy.
  5. CodeArtifact tokens expected to be long-lived — tokens expire after 12 hours; pipelines must call get-authorization-token in pre_build, not bake into images.

FAQ

Q1: Why use CodeArtifact instead of a self-hosted Nexus or Artifactory? CodeArtifact is fully managed (no nodes to patch), integrates with IAM and KMS natively, supports the same upstream-chaining pattern Nexus offers, and bills per request rather than per server. Self-hosted is reasonable only when org policy forbids managed services or when you need package types CodeArtifact does not support yet.

Q2: Can a single CodeArtifact repository serve multiple package formats (npm + Maven + PyPI)? Yes. A repository can hold packages of all supported formats simultaneously. Common pattern: one team-frontend repository with npm and Maven artefacts together.

Q3: How do I prevent developers from pushing malicious npm packages with overlapping names to the org repository? Set the org-level repository's external connections policy to allow specific public registry connections only, and enable package origin controls (upstream or internal). Internal packages cannot be overridden by an upstream package with the same name (avoiding dependency-confusion attacks).

Q4: What is the difference between ECR private and ECR public? ECR private requires authentication and lives at <account>.dkr.ecr.<region>.amazonaws.com. ECR public is anonymous-pullable (subject to AWS-managed rate limits) and lives at public.ecr.aws/<alias>/<repo>. Use public for OSS images you want the world to consume; private for internal artifacts.

Q5: How does ECR replication interact with cross-account access? Replication copies images to the destination registry; access control still applies independently — the destination account's repository policy determines who can pull. Replication does not auto-grant access; it just gets the bits there.

Q6: How do I integrate EC2 Image Builder with my CodePipeline? Image Builder pipelines can be triggered by CloudWatch Events / EventBridge or invoked via the SDK. From CodePipeline, use a Lambda invoke action that starts the Image Builder pipeline and polls for completion, then publishes the new AMI ID to SSM Parameter Store for downstream Auto Scaling group consumption.

Q7: Does AWS Signer work for Docker images stored in ECR? Yes. Signer integrates with the Notation specification for OCI image signing. ECR can be configured to require signature verification before image pull (via Inspector enhanced scanning policies and Notation tooling).

Q8: What is the simplest way to expire old CloudFormation deployment artifacts in S3? Apply an S3 lifecycle rule on the artifact bucket with a prefix filter (pipeline-artifacts/) and a Days: 90 expiration for current versions, plus a NoncurrentDays: 30 for non-current (versioned) objects. This caps cost while preserving recent artifacts for rollback.

Official sources

More DOP-C02 topics