ECS and EKS Container Deployment Strategies

Container deployment on AWS spans two distinct platforms — ECS (with Fargate or EC2 launch types) and EKS — each with its own deployment-strategy mechanics, integration patterns, and exam-relevant gotchas. DOP-C02 expects fluency in both because real-world DevOps engineers pick between them based on team skill, ecosystem fit, and operational overhead. The exam tests whether you understand ECS's three deployment types (rolling update, blue/green via CodeDeploy, external controller), EKS's Kubernetes-native rolling updates, and the AWS-blessed paths for canary and progressive delivery on each platform.

This guide treats ECS and EKS as twin tracks of the same problem: how do you safely promote a new container image from build to production? On ECS the answer leans heavily on CodeDeploy's blue/green for ECS and the deployment circuit breaker; on EKS it leans on Kubernetes Deployment object semantics, often augmented with AWS App Mesh, Argo Rollouts, or Flagger for canary. By the end you should be able to look at any container deployment question and pick the right primitive without confusion between the two ecosystems.

Why Container Deployment Strategies Are a Pro-Level Concern

Container deployments are deceptively simple at the API level — aws ecs update-service --task-definition foo:42 or kubectl set image deployment/foo foo=registry/foo:42 — and deceptively complex underneath. Both APIs trigger orchestration that involves new task or pod scheduling, registration with load balancer target groups, draining of old tasks, health-check evaluation, and rollback on failure. The exam tests where the orchestration choices affect downtime, blast radius, and rollback time.

Three forces push container deployment to Pro-tier difficulty. First, task and pod lifecycle: tasks/pods come up gradually, and load balancer health checks gate when traffic flows; misconfigured health checks turn a deployment into an outage. Second, deployment configuration tuning: ECS's minimumHealthyPercent and maximumPercent, plus Kubernetes' maxSurge and maxUnavailable, govern how aggressively old tasks are replaced — and the right values depend on capacity, cost, and SLA constraints. Third, integration with deployment-safety services: CodeDeploy for ECS adds canary/linear traffic shifting and alarm-based rollback; EKS-native alternatives include Argo Rollouts and Flagger, neither of which is AWS-managed.

ECS rolling update: the default ECS deployment type; replaces tasks gradually controlled by minimumHealthyPercent and maximumPercent.
ECS blue/green (CodeDeploy): deployment type CODE_DEPLOY on the service; CodeDeploy provisions a green task set, shifts ALB traffic, then drains the blue.
ECS deployment circuit breaker: an optional setting that monitors deployment health and rolls back automatically on repeated task failures.
Task set: an ECS construct representing a versioned set of tasks within a service; blue/green creates a new task set per deployment.
minimumHealthyPercent / maximumPercent: ECS service configuration controlling how many tasks must remain running and how many extras can launch during a rolling update.
EKS Deployment: a Kubernetes object with RollingUpdate or Recreate strategy; AWS does not modify Deployment semantics.
maxSurge / maxUnavailable: Kubernetes Deployment fields controlling rolling update aggressiveness — analogous to ECS percent settings.
Argo Rollouts / Flagger: third-party Kubernetes controllers providing canary, blue/green, and progressive delivery; popular in EKS canary scenarios.
External deployment controller (ECS): deployment type EXTERNAL that hands lifecycle to your own controller via the Task Sets API.
Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-types.html

Plain-Language Explanation: ECS and EKS Container Deployment

Container deployment can feel abstract because tasks and pods are short-lived and orchestration is opaque. Three analogies from different domains make the mechanics tangible.

Analogy 1: Theatre Cast Replacement

Picture a long-running theatre show replacing its principal cast. ECS rolling update is the understudy swap protocol: between performances, one or two cast members are replaced while the rest continue with the show; the audience experiences the show with a slightly different cast each night until the rotation is complete. minimumHealthyPercent: 100 and maximumPercent: 200 is the luxury rotation — bring on the new cast as additional members first (so the stage is briefly fuller than usual), then dismiss the old cast once the new is in position. minimumHealthyPercent: 50 is the lean rotation — let the cast shrink to half size during the swap; cheaper but riskier if a flu hits.

ECS blue/green via CodeDeploy is the second-stage rehearsal: rent a second theatre, rehearse the new cast there fully, do a dress rehearsal, then on opening night route ticket-holders to the new theatre while the old theatre runs its final show as a fallback. If reviews of opening night are bad, the producer can re-route audiences back to the old theatre instantly.

EKS rolling update is almost identical to ECS in mechanics but the orchestration software is different — Kubernetes is the stage manager rather than ECS. The principles map cleanly: maxSurge: 25% means you can add 25% extra cast during rotation, maxUnavailable: 25% means at most 25% can be missing.

Analogy 2: Restaurant Kitchen Brigade Shift Change

A high-volume restaurant kitchen rotates its line cooks during dinner service. The task or pod is the line cook on a station. The service is the brigade — the set of cooks needed to plate orders at full speed. The rolling update is bringing in fresh cooks one at a time; old cooks finish their current orders (stopTimeout), step off the line, fresh cooks step on. minimumHealthyPercent: 100 means the brigade is never short-staffed (fresh cook joins, then old cook leaves). minimumHealthyPercent: 50 means at most half the line is offline at once — orders take longer.

The deployment circuit breaker is the head chef's authority to cancel the rotation: if three fresh cooks in a row burn the first three plates they make, the head chef calls off the rotation, sends the failed cooks home, and brings the experienced cooks back to finish service. The customers (load balancer traffic) never know there was a problem — they just notice service slowed briefly.

Analogy 3: Air Traffic Control Tower Software Update

An ATC tower replacing flight-tracking software has zero tolerance for downtime. In-place rolling update would be unthinkable — you cannot reboot the radar mid-flight. Blue/green is the only acceptable approach: the airport stands up a second control room with the new software, controllers train and certify in it for a week, then on a slow Tuesday at 2 AM all radio communication and radar feeds are routed to the new room while the old room stays warm for 4 hours of fallback. If an unforeseen software issue surfaces, controllers can flip back instantly.

The CodeDeploy alarm-based rollback is the automatic safety system: a CloudWatch alarm watching for "controller-detected aircraft deviations" auto-aborts the cutover if any anomaly is detected during the first hour of the new room's operation. No human decision required.

The theatre analogy is best for visualising rolling updates and percentage settings. The restaurant analogy maps cleanest to the deployment circuit breaker. The ATC tower analogy is the right model when the exam stresses zero downtime, alarm-based safety, and warm fallback. Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-types.html

ECS Deployment Type One — Rolling Update

The default ECS deployment is a rolling update controlled by two service settings: minimumHealthyPercent (default 100, range 0–100) and maximumPercent (default 200, range 100–200).

minimumHealthyPercent defines the floor of running tasks during deployment. maximumPercent defines the ceiling — extra tasks ECS may launch beyond the desired count. Combinations dictate aggressiveness:

min 100, max 200: bring up new tasks first, drain old when new are healthy. No capacity loss, double cost during deploy. Default and most common.
min 50, max 100: shrink to half, replace, scale back. Capacity loss, no extra cost. Suitable for non-prod or low-traffic.
min 100, max 100: not allowed — cannot replace tasks without surging or shrinking.
min 0, max 100: ECS may stop all tasks before launching new ones. Full downtime during deploy. Suitable only for batch jobs.

Rolling updates are the simplest path for ECS. Trade-off: rollback is slow (re-deploy old task definition) and there is no built-in traffic-weighting (you cannot send 5% of traffic to the new version — every task either has full traffic or none).

ECS Deployment Type Two — Blue/Green via CodeDeploy

For traffic-weighted and validate-before-cutover deployments, ECS service deployment type CODE_DEPLOY hands lifecycle to CodeDeploy.

CodeDeploy provisions a new task set with the new task definition, registers it with a separate test target group on the ALB, runs BeforeAllowTestTraffic and AfterAllowTestTraffic hooks (Lambda functions), then shifts production traffic from the blue task set to the green task set per the deployment configuration:

CodeDeployDefault.ECSAllAtOnce: shift all traffic at once after green is ready.
CodeDeployDefault.ECSLinear10PercentEvery1Minutes (or 3 Minutes): linear ramp.
CodeDeployDefault.ECSCanary10Percent5Minutes (or 15 Minutes, 30 Minutes): 10% canary, then 90% all at once.

Custom deployment configurations allow arbitrary canary or linear schedules.

Blue/green requires two ALB target groups (the test and production groups) registered with the same listener via two listener rules. CodeDeploy switches which target group each rule routes to — that is how traffic shifting actually works mechanically.

ECS blue/green via CodeDeploy is only supported with ALB; NLB and Classic Load Balancer are not. The deployment requires two pre-created target groups (one for blue, one for green) that CodeDeploy alternates between. Stems mentioning "the team uses NLB" disqualify blue/green for ECS — they must use rolling update or switch to ALB. Reference: https://docs.aws.amazon.com/codedeploy/latest/userguide/deployments-create-ecs-bg.html

ECS Deployment Type Three — External Controller

The third deployment type, EXTERNAL, hands all lifecycle to your own controller via the Task Sets API. Useful for custom progressive-delivery tools or service-mesh-driven deployments. Rare on the exam but worth knowing exists.

ECS Deployment Circuit Breaker

The deployment circuit breaker auto-rolls back rolling-update deployments when the new tasks repeatedly fail to start or pass health checks. Configure it in the service: deploymentConfiguration.deploymentCircuitBreaker.enable: true and rollback: true.

Behaviour: ECS monitors task launch failures during a rolling deployment. If failures cross the threshold (the threshold scales with desired count, e.g., 2 for 1-3 tasks, more for larger services), the deployment is marked failed and (if rollback: true) re-deployed with the previous task definition.

This is the rolling-update equivalent of CodeDeploy's auto rollback for blue/green. A frequent DOP-C02 distractor pattern: stems describing rolling updates that fail to detect bad deployments — the missing piece is enabling the circuit breaker.

The deployment circuit breaker triggers on task launch failures (image pull errors, container start failures, immediate task crashes). It does not trigger on application-layer 5xx errors after tasks become healthy. To catch application errors, combine with a CloudWatch alarm on ALB target group 5xx and have the alarm alarm-trigger an EventBridge rule that runs a Lambda to call aws ecs update-service with the old task definition. Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-circuit-breaker.html

EKS Deployment Mechanics

EKS does not modify Kubernetes Deployment semantics. The two strategies built into Kubernetes are:

RollingUpdate (default): replace pods gradually controlled by maxSurge and maxUnavailable (count or percentage). Analogous to ECS rolling update.
Recreate: stop all old pods, then start new ones. Equivalent to ECS min 0, max 100. Suitable only for stateful apps where parallel old/new is harmful.

Native Kubernetes does not support canary or true blue/green out of the box. Common patterns:

Argo Rollouts: a CRD-based controller that adds canary, blue/green, and experiment strategies, integrating with ALB Ingress Controller for traffic weighting.
Flagger: similar functionality, integrates with App Mesh, Istio, and ALB.
Two-Deployment swap: manually create v1 and v2 Deployments, swap a Service selector. Crude but works.

For DOP-C02, knowing that EKS canary requires either a third-party controller or App Mesh-based traffic shifting is the key insight. The exam will rarely require deep Argo Rollouts internals.

Service Mesh and Traffic Shifting on EKS

AWS App Mesh (deprecated for new customers but still in scope) and the open-source alternatives (Istio, Linkerd) enable fine-grained traffic shifting at the service-mesh layer. For DOP-C02, the relevant pattern is:

Deploy v2 alongside v1 with the same Service routing.
Configure the service mesh's virtual service to send 10% of traffic to v2.
Monitor metrics; ramp 10% → 25% → 50% → 100%.
Rollback by reverting the virtual-service weights.

App Mesh is going through deprecation as AWS shifts focus to Amazon VPC Lattice; for current exam stems, mentions of "EKS canary deployment with traffic weighting and AWS-managed mesh" still imply App Mesh. New designs may favour Lattice.

ALB Target Group Health Checks for Containers

Container deployments live or die by health check configuration. Both ECS and EKS register tasks/pods with ALB target groups; misconfigured health checks cause:

False healthy: tasks report healthy before the application is ready (e.g., framework startup completed but database connection still in progress) — traffic routes to broken tasks.
False unhealthy: tasks legitimately need 60 seconds to warm up but health check fails after 30 seconds — deployments thrash, never stabilising.

Tune HealthyThresholdCount, UnhealthyThresholdCount, HealthCheckIntervalSeconds, and HealthCheckTimeoutSeconds to match your application's true readiness profile. For ECS, also set the service parameter healthCheckGracePeriodSeconds to give new tasks slack before health checks begin counting.

ECS deploys can fail repeatedly when an application takes longer to start than the ALB target group's HealthCheckIntervalSeconds * UnhealthyThresholdCount. The fix is healthCheckGracePeriodSeconds on the ECS service — it ignores health check failures for the first N seconds after a task starts. Without it, slow-starting apps trigger circuit breaker rollback even when nothing is wrong. Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-service.html

ECS Capacity Providers and Deployment Interaction

For ECS on EC2 launch type, capacity providers manage instance scaling. During deployments, capacity providers automatically scale out the cluster's underlying ASG to accommodate maximumPercent task surge.

A common DOP-C02 trap: deploying with maximumPercent: 200 against a cluster sized exactly for desired count plus no spare capacity. The new tasks queue waiting for capacity, deployment stalls, eventually rollback fires. Either size the cluster with headroom, enable a capacity provider with managed scaling, or use Fargate (no capacity planning required).

EKS Pod Disruption Budgets and Deployment Safety

EKS adds a Kubernetes-native safety mechanism missing from ECS: PodDisruptionBudgets (PDBs). A PDB declares minimum available pods of a workload during voluntary disruptions (rolling updates, node drains, cluster upgrades).

For deployment safety, PDBs ensure rolling updates do not exceed the budget — Kubernetes refuses to evict pods if doing so would breach the PDB. This is especially important for stateful or quorum-based services where breaching the budget causes outage.

Common Trap Patterns

Trap one: confusing ECS rolling update percentages with Kubernetes maxSurge/maxUnavailable. They behave similarly but use different semantics; the exam will not allow translation errors.

Trap two: assuming ECS blue/green works with NLB. It does not — ALB only.

Trap three: enabling the circuit breaker but expecting it to catch application 5xx; it only catches task launch failures.

Trap four: deploying to ECS without healthCheckGracePeriodSeconds for slow-starting applications.

Trap five: assuming EKS supports native canary; it does not without third-party controllers or service mesh.

A frequent symptom: ECS service shows (service abc) was unable to place a task. Cause is usually capacity (cluster has no instance with sufficient CPU/memory for the new task during surge) or networking (no available ENI in the AZ for awsvpc-mode tasks). The fix is a capacity provider with managed scaling or, for Fargate, ensuring subnets have sufficient available IPs and the ENI quota is not exceeded. Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-service.html

End-to-End Container Pipeline Pattern

A canonical DOP-C02 ECS pipeline assembles like this. Source in CodeCommit. Build in CodeBuild produces a Docker image, pushes to ECR, and emits an imagedefinitions.json and appspec.yml artifact. Deploy to staging uses ECS rolling update with circuit breaker enabled, deploying to a Fargate service. Approval action pauses for release-manager signoff. Deploy to production uses CodeDeploy for ECS with blue/green and ECSCanary10Percent15Minutes configuration, plus alarm-based auto rollback on ALB 5xx rate.

For EKS, replace the deploy actions with kubectl apply (often through a CodeBuild stage running with kubectl configured for the cluster) plus an Argo Rollouts manifest for canary.

For any ECS or EKS deployment question, anchor on:

Platform: ECS (Fargate or EC2) or EKS.
Deployment type: rolling update / blue/green via CodeDeploy / external controller (ECS); RollingUpdate / Recreate / Argo Rollouts / Flagger / mesh-based (EKS).
Capacity surge: ECS min/maxPercent or Kubernetes maxSurge/maxUnavailable.
Safety nets: ECS deployment circuit breaker, ALB health checks with grace period, alarm-based rollback (CodeDeploy), PodDisruptionBudgets (EKS).

Any container question maps cleanly to one of these four. Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-types.html

常考陷阱（Common Exam Traps）

ECS blue/green with NLB — not supported; only ALB has the listener-rule-swap mechanism CodeDeploy uses.
Circuit breaker expected to catch 5xx errors — it only catches task launch failures; pair with CloudWatch alarm + EventBridge for application-layer rollback.
Missing healthCheckGracePeriodSeconds — slow-starting apps fail health checks before warming up, triggering false rollback loops.
EKS canary without third-party controller — Kubernetes Deployments do not support traffic-weighted canary natively; need Argo Rollouts, Flagger, or service mesh.
maximumPercent: 200 without cluster headroom — surge tasks queue waiting for capacity; rollout stalls and eventually fails.

FAQ

Q1: When should I prefer ECS rolling update over blue/green via CodeDeploy? Rolling update is simpler, cheaper (no double capacity), and faster for non-critical workloads. Choose blue/green when stems mention zero downtime, traffic-weighted canary, validate-before-cutover, or alarm-based rollback. ECS rolling update with circuit breaker covers most non-critical needs.

Q2: How do I do canary deployment on EKS without Argo Rollouts? Use AWS App Mesh (declining) or VPC Lattice with weighted target groups. Deploy v1 and v2 as separate Deployments with separate Services, point a virtual service or Lattice listener at both with weights, ramp weights manually or via automation. Rougher but no third-party CRDs required.

Q3: Can I use ECS service Auto Scaling and rolling update simultaneously? Yes. Service Auto Scaling adjusts desired count based on metrics; rolling update replaces tasks during deploys. They cooperate cleanly: a deployment in progress respects the current desired count, and scaling events during deployment trigger additional rolling replacements.

Q4: What happens if a CodeDeploy ECS blue/green deployment fails during traffic shift? CodeDeploy auto-rolls back: it reverts ALB listener rules to point at the blue task set, drains the green task set, and reports failure. The old tasks are still running, so there is no service interruption. If terminateBlueInstancesOnDeploymentSuccess is configured, the blue task set is preserved for the configured wait time.

Q5: How do I handle stateful workloads on EKS with rolling update? Use Recreate strategy (downtime, but no parallel old/new) or, for distributed databases, switch to a StatefulSet with OrderedReady pod management. PodDisruptionBudgets ensure quorum is preserved during the rollout.

Q6: Does ECS rolling update support traffic weighting like CodeDeploy blue/green? No. Rolling update flips traffic at the per-task granularity — a task either has traffic (registered in target group, healthy) or no traffic. There is no "send 10% of traffic to new tasks". For weighted shifting, use blue/green via CodeDeploy.

Q7: Why does my ECS service deploy succeed but the application returns 5xx after deployment? Most likely the new task definition has a configuration issue (missing environment variable, wrong image tag) that causes runtime errors. ALB health check passed (probably checking /health which returns 200 even when business logic fails). Mitigation: implement a /ready endpoint that exercises real dependencies, or add CloudWatch alarms on application-layer error rate to trigger external rollback.

Q8: How do I orchestrate a multi-service deployment where service A must be deployed before service B? On ECS: chain CodePipeline deploy actions sequentially, one per service. On EKS: use a Kubernetes Job that runs kubectl apply in order, or wrap deployments in a Helm umbrella chart with helm.sh/hook: pre-install annotations. For complex orchestration, drive deployments from a Step Functions state machine that pulls deployment state from each service before proceeding.

Why Container Deployment Strategies Are a Pro-Level Concern

Plain-Language Explanation: ECS and EKS Container Deployment

Analogy 1: Theatre Cast Replacement

Analogy 2: Restaurant Kitchen Brigade Shift Change

Analogy 3: Air Traffic Control Tower Software Update

ECS Deployment Type One — Rolling Update

ECS Deployment Type Two — Blue/Green via CodeDeploy

ECS Deployment Type Three — External Controller

ECS Deployment Circuit Breaker

EKS Deployment Mechanics

Service Mesh and Traffic Shifting on EKS

ALB Target Group Health Checks for Containers

ECS Capacity Providers and Deployment Interaction

EKS Pod Disruption Budgets and Deployment Safety

Common Trap Patterns

End-to-End Container Pipeline Pattern

常考陷阱（Common Exam Traps）

FAQ

Official sources

More DOP-C02 topics