-
Notifications
You must be signed in to change notification settings - Fork 0
Add warm pool support for faster runner startup #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hskiba
wants to merge
9
commits into
main
Choose a base branch
from
feature/warm-pool
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Skip extraction step since runner is now pre-extracted during AMI build at /opt/actions-runner/. This reduces startup time and disk I/O.
Implement a warm pool of pre-stopped EC2 instances to reduce GitHub
runner startup time. Key features:
- New WarmPoolConfig parameter (JSON map of instance type to pool size)
- Warm pool instances stop after first boot, ready for quick activation
- When activated, shutdown behavior changes to TERMINATE (ephemeral)
- Pool automatically replenishes when instances are used
- Pool size of 0 or empty config disables feature (current behavior)
Example config: {"c8a.4xlarge":2,"c8a.2xlarge":3}
New Lambda permissions: ec2:DescribeInstances, ec2:StartInstances,
ec2:StopInstances, ec2:ModifyInstanceAttribute
- Extract warmPoolFilters() helper to reduce duplication - Consolidate EC2 launch logic into buildRunInstancesInput() and launchInstance() - Extract tryAcquireWarmInstance() and replenishWarmPool() helpers - Use single multipartTemplate constant - Simplify nil map access (Go returns zero value for nil maps) Net reduction of ~80 lines while improving readability.
- Add CloudWatch Events rule that triggers every 5 minutes - Add handleMaintenance() to check and populate all configured instance types - Refactor handler to dispatch between API Gateway and scheduled events - Extract getLaunchConfig() helper to reduce duplication The maintenance function iterates through all configured instance types and launches instances to reach target pool sizes.
Fall back to extracting from /opt/runner-cache/ if /opt/actions-runner doesn't exist. This supports both old AMIs (with runner cache) and new AMIs (with pre-extracted runner).
- Return empty map instead of nil from parseWarmPoolConfig() - Remove redundant nil check in handleMaintenance() - Consolidate duplicate launch config building in handleWebhook() by reusing getLaunchConfig() (~40 lines removed)
Generate JIT config from Lambda via GitHub API instead of passing PAT to the instance. This eliminates the 15-30 second config.sh registration step on the runner. Changes: - main.go: Add generateJITConfig() to call GitHub's JIT config API - main.go: Build labels list and pass JIT config to user-data template - user-data.sh: Remove get_github_token() and config.sh steps - user-data.sh: Use ./run.sh --jitconfig instead
ModifyInstanceAttribute with BlobAttributeValue automatically handles base64 encoding, so we shouldn't pre-encode. This was causing user-data to exceed the 16KB limit.
Cloud-init caches user-data from first boot, so when we update user-data
on warm pool activation, the cached script runs instead of the new one.
Fix by storing JIT config in SSM Parameter Store (/github-runner/jit-config/{instance-id})
and having the user-data script fetch it from there. This works because:
1. The script itself doesn't change (no templating needed)
2. The SSM parameter is created fresh for each job
Changes:
- main.go: Add storeJITConfigInSSM(), remove template usage
- user-data.sh: Fetch JIT config from SSM using instance ID
- template.yaml: Add SSM permissions for Lambda and EC2 instance
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add warm pool of pre-stopped EC2 instances to reduce GitHub runner startup time from ~90s to ~20-30s.
Features
{"c8a.4xlarge":1,"c8a.2xlarge":2}Changes
How it works
Test plan
WARM_POOL_CONFIG={"c8a.4xlarge":1}