Patching Scheduled Auto Scaling Groups with AWS
Auto Scaling Groups (ASGs) that operate on a schedule, scaling down during off-peak hours (e.g., overnight or weekends) to save costs, present a unique challenge for patch management. Standard patching jobs often rely on instances being available, but scheduled ASGs might have zero running instances during the designated patching window.
The Challenge with Scheduled ASGs
- Instance Unavailability: If a patching window (e.g., using AWS Systems Manager Maintenance Windows) occurs when the scheduled ASG has scaled down to zero instances, the patching tasks will fail or remain pending until instances are manually launched.
- Compliance Gaps: Delayed patching can lead to security and compliance gaps as instances run without the latest updates when they eventually scale up.
- Manual Intervention: Administrators might need to manually adjust the ASG schedule or scale up instances just for patching, increasing operational overhead and the risk of errors.
Automated Solution Strategy
A robust solution involves automating the process to ensure instances are available during the patching window and then returned to their scheduled state. This typically combines several AWS services:
- Identify Target ASGs: Use consistent tagging (e.g.,
PatchingGroup=Scheduled
) to identify the scheduled ASGs that require this specific patching workflow. - Schedule Patching Window: Define an appropriate maintenance window using AWS Systems Manager Maintenance Windows, ideally during off-peak hours but when patching needs to occur.
- Automate Scaling Up: Before the patching tasks run, use automation to temporarily override the schedule and scale up the target ASG to ensure at least one instance is running. This can be orchestrated using:
- Amazon EventBridge: To trigger actions based on the maintenance window schedule.
- AWS Lambda: Functions triggered by EventBridge can execute logic to check the ASG state and call the
UpdateAutoScalingGroup
API to setMinSize
andDesiredCapacity
to 1 (or more if needed).
- Apply Patches (Golden AMI Method Recommended): Once instances are running, execute the patching process. The preferred method for ASGs is often the “Golden AMI” approach:
- Use Systems Manager Automation (potentially triggered by the Maintenance Window or EventBridge) to launch an instance from the current AMI, apply patches (using
AWS-RunPatchBaseline
), create a new, patched AMI, and update a central parameter (like an SSM Parameter Store parameter) with the new AMI ID. (See previous posts on general ASG patching).
- Use Systems Manager Automation (potentially triggered by the Maintenance Window or EventBridge) to launch an instance from the current AMI, apply patches (using
- Update ASG Launch Template/Configuration: After the new AMI is created and the parameter updated, another automated step (e.g., another Lambda function or step in an Automation runbook) updates the ASG’s Launch Template or Launch Configuration to use the new AMI ID from the parameter store.
- Trigger Instance Refresh (Optional but Recommended): If immediate rollout is desired after patching within the window, trigger an Instance Refresh on the ASG. Otherwise, the ASG will naturally use the new AMI for subsequent scale-up events.
- Automate Scaling Down: After the patching window and associated tasks are complete, another automated step (triggered by EventBridge or the completion of the patching automation) restores the ASG’s
MinSize
andDesiredCapacity
potentially back to 0, allowing its regular schedule to take over again.
Implementation Tools
- AWS Systems Manager (Maintenance Windows, Automation, Parameter Store): For scheduling, orchestrating patching steps, and storing the latest AMI ID.
- Amazon EventBridge: For scheduling and triggering Lambda functions or Systems Manager Automations.
- AWS Lambda: For custom logic to check ASG state, scale up/down, and update configurations.
- AWS CloudFormation / Terraform: To define and deploy the entire automation infrastructure (IAM roles, Lambda functions, EventBridge rules, Maintenance Windows, etc.) consistently.
By automating the coordination between scheduled scaling actions and patching windows, organizations can ensure their scheduled ASGs remain secure and compliant without manual intervention or disruptions to their cost-saving schedules.