Systems Manager State Manager and Patch Manager - DOP-C02 DevOps Engineer Study Notes

Q: Q1: When should I use State Manager vs AWS Config with remediation?

State Manager operates inside the instance OS - it runs scripts on the host. Config operates on AWS resource configuration - it sees IAM policies, security groups, S3 bucket settings. Use Config for "is the security group open to 0.0.0.0/0", use State Manager for "is the SSM Agent running on the host". Both can have remediation - the choice follows the layer.

Systems Manager (SSM) is the workhorse for fleet configuration management on DOP-C02. Where CloudFormation handles infrastructure provisioning, State Manager ensures running instances stay in their desired software state, and Patch Manager delivers and reports OS and application patches according to compliance schedules. Combined with Run Command, Automation, Maintenance Windows, Inventory, and Distributor, they make up the operational layer the exam tests under Domain 2.3 ("automated solutions for complex tasks") and Domain 5.2 ("configuration changes in response to events").

This guide assumes you know what an SSM Agent is and that EC2 instances need an instance profile with the AmazonSSMManagedInstanceCore policy to register. The DOP-C02 focus: State Manager associations vs Run Command vs Automation, patch baseline rules and approval delays, patch groups via tags, maintenance windows and registered targets/tasks, compliance reporting, hybrid activations for on-prem servers, and the trap-laden distinction between patching frequency, scan-only vs scan-and-install, and override lists.

Why SSM State Manager and Patch Manager Matter on DOP-C02

DOP-C02 explicitly lists "automating system inventory, configuration, and patch management (for example, Systems Manager, AWS Config)" as a Domain 2.3 skill. Community pass reports cite SSM scenarios as one of the most-confused topic clusters: candidates know individual SSM components but trip on which to pick when. "The team needs to ensure the CloudWatch Agent is installed and running on every EC2 instance with a specific tag, and reinstall it if anyone removes it" - that is State Manager (declarative, continuous), not Run Command (imperative, one-shot). "The team needs to patch 5,000 instances across three time zones during their respective night windows" - that is Maintenance Windows + Patch Manager + Patch Groups, not a Lambda triggered on schedule.

The exam also separates State Manager from AWS Config rules with remediation: both can keep resources in a desired state, but they target different layers. Config sees AWS resource configuration; State Manager sees inside the instance OS. Knowing which layer the question targets is the first elimination step.

SSM Agent: the daemon installed on EC2/on-prem hosts that polls SSM for commands, associations, and inventory.
Managed instance: any host (EC2 or on-prem) registered with SSM and visible in Fleet Manager.
State Manager association: a binding between a target (instance ID, tag, resource group) and an SSM document; State Manager keeps the target in compliance with the document on a defined schedule.
SSM document (SSM Doc): a JSON/YAML script: Command (Run Command), Automation (workflow), Session (Session Manager), Package (Distributor), ApplicationConfiguration (AppConfig), ApplicationConfigurationSchema, etc.
Run Command: imperative, one-shot execution of a Command document against targets.
Automation: orchestrated multi-step workflows (often AWS-provided runbooks like AWS-RestartEC2Instance).
Patch baseline: a set of rules (severity, classification, approval delay) that decides which patches are auto-approved.
Patch group: a tag value (Patch Group=production) that maps instances to a specific patch baseline.
Maintenance window: a recurring time window with registered targets and tasks (Run Command, Automation, Lambda, Step Functions); enables off-hours patching.
SSM Inventory: a metadata collector that catalogs installed apps, files, network config, services, and custom inventory per instance.
Hybrid activation: a token-and-ID pair that lets on-prem servers register with SSM as managed instances using IAM service roles.
Reference: https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-state.html

Plain-Language Explanation: SSM State and Patch Management

These services map cleanly to operational patterns from non-software domains. Three angles cover the declarative-vs-imperative split, the patch lifecycle, and the maintenance window mechanics.

Analogy 1: The Office Building Janitorial Service

Picture a 50-floor office building. Run Command is the one-off cleaning request - "send a janitor to floor 14 to clean up a spill, then leave". One trip, one job, no follow-up. State Manager is the standing janitorial contract - "every floor must be vacuumed every weeknight, every restroom restocked daily; if you find an empty soap dispenser, refill it". Continuous, declarative, self-healing. Automation is the multi-step move-out workflow - "wash carpets, paint walls, replace ceiling tiles, return keys" - run as a single coordinated procedure.

Patch Manager is the HVAC maintenance schedule: rules say "service all heat pumps over 5 years old monthly, all newer ones quarterly", and the building manager (the patch baseline) auto-approves which units need work after a 7-day delay (so emergency patches are reviewed before deploy). Patch groups are the building zones - some floors are tagged Production (mission-critical, careful patching with full validation) and some are tagged Lab (aggressive patching, OK to break).

Maintenance windows are the after-hours service hours: nobody wants HVAC outages during office hours, so all heavy maintenance is scheduled 11 PM - 5 AM. Within the window, registered tasks execute in a controlled fashion with concurrency limits.

Analogy 2: The Car Dealership Service Department

A dealership services hundreds of cars per week. State Manager is the standing service plan - "every car owned by Fleet Customer X must have its oil-pressure-sensor firmware match the manufacturer's latest release; if the sensor is replaced and runs old firmware, re-flash automatically". Run Command is the one-off recall notice - "this batch of 200 VINs needs the airbag firmware updated once". Automation is the multi-step pre-delivery inspection - "run diagnostics, replace cabin air filter, top up coolant, wash exterior".

Patch Manager with patch baselines is the manufacturer's service campaign rules - "any defect classified Critical or High is auto-applied 3 days after the bulletin; Medium is reviewed weekly; Low is deferred to the next scheduled service". Patch groups are the fleet customer tiers - Government clients (Patch Group=critical-fleet) get the strictest baseline with 0-day approval; Lab cars (Patch Group=test-fleet) get bleeding-edge patches immediately for validation.

Maintenance windows are the service bay hours: each customer gets a recurring slot ("Mondays 6 AM - 10 AM") with a registered set of tasks (oil, tires, firmware) that the bay technician (Run Command target) executes within concurrency limits (no more than 5 cars in service at once).

Analogy 3: The Hospital Medication Reconciliation Workflow

A hospital reconciles medications daily for every patient. State Manager is the standing reconciliation rule - "every admitted patient must have their medication list match the physician's current order; if the pharmacy stocks a wrong dose, automatically flag and replace". Run Command is the one-off emergency order - "administer 5 mg of medication X to patient in room 412". Automation is the discharge workflow - "verify final prescriptions, generate take-home documentation, update the EHR, schedule follow-up".

Patch Manager baselines map to the hospital formulary committee rules: critical drug recalls auto-pull stock immediately; cosmetic packaging changes are reviewed quarterly. Patch groups are the ward classifications - ICU patients get the strictest medication management; ambulatory clinic patients get standard rules.

Maintenance windows are the scheduled rounding hours when nurses execute medication tasks - tasks run with a concurrency cap (one nurse can only see N patients per hour) and abort if too many tasks fail (failure tolerance), preserving patient safety.

For the State Manager vs Run Command vs Automation distinction, the car dealership maps cleanest. For patch baseline approval delay reasoning, the hospital formulary is intuitive. For maintenance window concurrency and failure tolerance, the office janitorial schedule is closest. Reference: https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-patch.html

State Manager: Continuous Configuration

State Manager binds an SSM document to a target on a schedule and re-applies the document if it drifts.

Association Anatomy

An association includes:

Target: instance IDs, tag-based, resource group, or all instances in the account-region.
Document: typically a Command document (e.g., AWS-ConfigureAWSPackage to install/update software) or a custom YAML doc.
Parameters: passed to the document.
Schedule expression: cron or rate (e.g., cron(0 2 ? * SUN *) for weekly Sunday 2 AM).
Compliance severity: how non-compliance is reported.
Output location: S3 bucket for execution logs.

State Manager runs the association on the schedule and when an instance launches and registers (so new instances are immediately compliant). Run-on-launch is critical for autoscaling groups.

State Manager does not continuously watch instances and react to drift. It runs the association on the configured schedule. If you set the schedule to weekly and someone uninstalls the agent on Tuesday, the instance stays non-compliant until Sunday. For drift-triggered remediation, combine State Manager with Config rules + EventBridge + Lambda. Reference: https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-state-about.html

State Manager vs Run Command

The exam loves this distinction:

Aspect	Run Command	State Manager
Cadence	Once	Recurring schedule
Drift handling	None (one-shot)	Re-apply on schedule
New instance	No	Auto-applies on registration
Use case	Ad-hoc operation	Continuous compliance

If the scenario says "ensure X is always installed" - State Manager. If it says "install X right now on a specific list" - Run Command.

Patch Manager: Baselines, Groups, Windows

Patch Manager has three primary primitives that compose into a fleet patching strategy.

Patch Baselines

A patch baseline is a ruleset that decides which patches are auto-approved for installation. AWS provides default baselines per OS (AWS-DefaultPatchBaseline for Amazon Linux, AWS-WindowsPredefinedPatchBaseline-OS, etc.) and you can create custom ones.

A custom baseline contains:

Approval rules: filters by Classification (Security, Bugfix, Critical, Important), Severity, OS Product, with an auto-approval delay (number of days after release before the patch becomes Approved).
Approved patches: explicit list, regardless of rules.
Rejected patches: explicit deny list, regardless of rules.
Compliance reporting: severity assigned to non-compliant instances.
Sources: alternative repositories (e.g., for RHEL custom satellite servers).

The auto-approval delay is the key safety knob: a 7-day delay means newly released patches are not installed for a week, allowing AWS or vendors to retract bad patches.

Patch Groups

You assign instances to a patch group via the tag key Patch Group (case-sensitive, with the space). Each baseline can be associated with one or more patch groups. An instance not in any group falls back to the default baseline for its OS.

Patch groups isolate environments: Patch Group=production mapped to a strict 14-day-delay baseline, Patch Group=staging mapped to a 0-day-delay baseline that gets patches first.

A common scenario: candidates use PatchGroup (no space) or patch-group (different casing) and instances silently fall back to the default baseline. The tag key is Patch Group - capital P, capital G, one space. SSM does not warn when the wrong tag is used. Reference: https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-patch-patchgroups.html

Maintenance Windows

A maintenance window has:

Schedule: cron/rate expression for the recurring window start.
Duration: how long the window stays open.
Cutoff: how many minutes before window end to stop starting new tasks.
Registered targets: instance IDs or tag filters.
Registered tasks: Run Command, Automation, Lambda invoke, or Step Functions execution.
Task concurrency: max parallel executions.
Task error rate: stop launching new tasks if error rate exceeds.

For patching, the standard pattern is:

Maintenance window with cron cron(0 2 ? * SUN *).
Registered targets: tag Patch Group=production.
Registered task: Run Command document AWS-RunPatchBaseline with Operation=Install.
Concurrency 10 percent, error rate 5 percent.

This patches up to 10 percent of production simultaneously every Sunday at 2 AM, aborting if more than 5 percent fail.

Scan-Only vs Scan-and-Install

AWS-RunPatchBaseline accepts an Operation parameter:

Scan: reports compliance without installing.
Install: applies missing approved patches.

A common DevOps pattern is daily Scan (cheap, reports compliance to dashboards) + weekly Install (during maintenance window). This separates the visibility cadence from the change cadence.

The default AWS-RunPatchBaseline patches OS-managed packages (yum, apt, Windows Update). Third-party apps installed outside the OS package manager (custom .tar.gz, container images baked outside Patch Manager) are not patched. Use SSM Distributor with State Manager to keep custom packages updated. Reference: https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-patch-baselines.html

Compliance Reporting

State Manager and Patch Manager both write compliance data to SSM Compliance: per-instance status (Compliant, Non-compliant, Inconclusive) with severity tagging. You can:

View aggregate compliance via Fleet Manager and the Compliance dashboard.
Push events to EventBridge on Compliance State Change and trigger Lambda remediation.
Aggregate compliance org-wide with Resource Data Sync to S3 + Athena/QuickSight.
Combine with AWS Config rules ec2-managedinstance-applications-required, ec2-managedinstance-association-compliance-status-check, ec2-managedinstance-patch-compliance-status-check.

Hybrid Activations and On-Prem

For on-prem servers, you create a hybrid activation:

aws ssm create-activation returns an ActivationId and ActivationCode.
Install SSM Agent on the on-prem server.
Run amazon-ssm-agent -register -code <code> -id <id> -region <region>.
The server appears as a managed instance with ID prefix mi- (vs i- for EC2).

Hybrid activations require an IAM service role (AmazonSSMRoleForInstancesQuickSetup or custom) that the activation references; the on-prem server uses Instance Metadata-style credentials issued by SSM.

Once registered, hybrid instances participate in State Manager, Patch Manager, Run Command, Automation, Inventory, and Session Manager identically to EC2.

SSM Distributor

Distributor packages software (.zip or .tar.gz) with install/uninstall/update scripts and version tracking. You publish a Distributor package, then deploy it via:

One-shot Run Command (AWS-ConfigureAWSPackage).
State Manager association (continuous - keep this version installed).
Maintenance window task.

For DOP-C02, Distributor is the answer when the question asks "deploy a third-party agent (Datadog, Splunk, custom binary) consistently across the fleet, with versioning and continuous compliance".

CloudWatch Agent and SSM

CloudWatch Agent installation is a frequent integration scenario. The exam-correct pattern:

Store the agent config in SSM Parameter Store as AmazonCloudWatch-<name> (the prefix matters for amazon-cloudwatch-agent-ctl recognition).
Use State Manager association with AmazonCloudWatch-ManageAgent document.
Schedule daily.
Pass the parameter name in the document parameters.

This gives you continuous compliance for monitoring instrumentation - if someone stops the agent, the next association run restarts it; if you update the parameter, all instances pick it up on schedule.

Common Pitfalls (常考陷阱)

Picking Run Command for continuous compliance: Run Command is one-shot; reapplication requires State Manager.
Wrong patch group tag key: must be Patch Group with space. Other variants silently fall back to default.
Forgetting auto-approval delay: a 0-day delay deploys patches the moment AWS/vendors release them - dangerous if a bad patch ships.
Confusing scan vs install: scan reports without changing; install changes. Both are valid AWS-RunPatchBaseline operations.
Assuming Patch Manager handles third-party apps: only OS-managed packages by default; custom apps need Distributor.
Missing instance profile: instances without AmazonSSMManagedInstanceCore (or equivalent) never appear as managed; State Manager and Patch Manager simply skip them silently.
Treating maintenance window as a hard guarantee: tasks that exceed the window's cutoff time are not started; the window's duration plus cutoff defines the actual execution envelope.

DOP-C02 exam priority — Systems Manager State Manager and Patch Manager. This topic carries weight on the DOP-C02 exam. Master the trade-offs, decision boundaries, and the cost/performance triggers each AWS service exposes — the exam will test scenarios that hinge on knowing which service is the wrong answer, not just which is right.

FAQ

Q1: When should I use State Manager vs AWS Config with remediation?

State Manager operates inside the instance OS - it runs scripts on the host. Config operates on AWS resource configuration - it sees IAM policies, security groups, S3 bucket settings. Use Config for "is the security group open to 0.0.0.0/0", use State Manager for "is the SSM Agent running on the host". Both can have remediation - the choice follows the layer.

Q2: Can Patch Manager patch container images?

Patch Manager patches running EC2 hosts. For containers, the answer is EC2 Image Builder for AMI patching or container image scanning + rebuild via CodeBuild + ECR + a deployment pipeline. Patching a running container is generally an anti-pattern; rebuild the image and redeploy.

Q3: How do I patch instances that are stopped most of the time?

Maintenance windows execute against running instances at the scheduled time; stopped instances are skipped. Patterns: (1) start instances 30 min before the window, patch, stop after; (2) rely on State Manager which runs on next instance launch; (3) for ASG-managed fleets, bake patches into the AMI via Image Builder.

Q4: What is the difference between SSM Automation and Step Functions?

Automation is purpose-built for SSM operations (assume role, run on EC2 fleet, integrate with Approval, AWS APIs). Step Functions is general orchestration (Lambda, ECS, SNS). Automation is simpler for SSM-centric workflows; Step Functions is more flexible for cross-service orchestration. Automation also has built-in approval steps that pause for IAM-authenticated approvers.

Q5: How do I report cross-account patch compliance to a single dashboard?

Resource Data Sync writes per-account inventory and compliance to a central S3 bucket. Aggregate the data with Athena and visualize in QuickSight, or push to OpenSearch via Lambda for real-time dashboards. Config aggregator can also collect compliance state across accounts but does not include in-OS data.

Q6: Why does my Patch Manager scan report Inconclusive on RHEL?

Inconclusive on RHEL typically means subscription-manager registration is missing, or the instance cannot reach Red Hat Update Infrastructure (RHUI) endpoints from its subnet. Validate connectivity, check the agent log at /var/log/amazon/ssm/amazon-ssm-agent.log, and confirm subscription-manager status.

Q7: Can State Manager target tags that change dynamically?

Yes. Tag-based targets re-evaluate at each association run; instances that gain the tag are included automatically, those that lose it are excluded. New instances launched into ASGs with the right tag are picked up at registration time as well.

Wrap-Up

State Manager keeps EC2 and on-prem instances continuously compliant with declarative SSM documents. Patch Manager handles OS patching via baselines, groups, and maintenance windows with auto-approval delay safeties. Run Command does one-shot operations. Automation orchestrates multi-step workflows. The DOP-C02 answer pattern: declarative continuous = State Manager, imperative one-shot = Run Command, multi-step = Automation, OS patches = Patch Manager + Maintenance Windows. Memorise the Patch Group tag key exactly, scan-vs-install operation modes, hybrid activation flow, and Distributor for third-party packages, and SSM scenarios become recognition exercises rather than service-name guessing.

Systems Manager — State Manager, Patch Manager, and Session Manager

Why SSM State Manager and Patch Manager Matter on DOP-C02

Plain-Language Explanation: SSM State and Patch Management

Analogy 1: The Office Building Janitorial Service

Analogy 2: The Car Dealership Service Department

Analogy 3: The Hospital Medication Reconciliation Workflow

State Manager: Continuous Configuration

Association Anatomy

State Manager vs Run Command

Patch Manager: Baselines, Groups, Windows

Patch Baselines

Patch Groups

Maintenance Windows

Scan-Only vs Scan-and-Install

Compliance Reporting

Hybrid Activations and On-Prem

SSM Distributor

CloudWatch Agent and SSM

Common Pitfalls (常考陷阱)

FAQ

Q1: When should I use State Manager vs AWS Config with remediation?

Q2: Can Patch Manager patch container images?

Q3: How do I patch instances that are stopped most of the time?

Q4: What is the difference between SSM Automation and Step Functions?

Q5: How do I report cross-account patch compliance to a single dashboard?

Q6: Why does my Patch Manager scan report Inconclusive on RHEL?

Q7: Can State Manager target tags that change dynamically?

Wrap-Up

Official sources

More DOP-C02 topics