This documentation provides a step-by-step guide how to configure AKS auto-scaling for GitHub self-hosted runners. This is established by configuring the actions-runner-controller.
The below auto-scaling guide consists of the following self-hosted runner specification:
- Optimized for Azure Kubernetes Service
- Compatible with GitHub Server and Cloud
- Organization-level runners
- Ephemeral runners
- Auto-scaling with workflow_job webhooks
- Webhook secret
- Ingress TLS termination
- Auto-provisioning Let's Encrypt SSL certificate
- GitHub App API authentication
- Prerequisites
- Setup AKS Cluster
- Setup Helm client
- Add cert-manager and NGINX ingress repositories
- Install cert-manager
- Apply Let's Encrypt ClusterIssuer config for cert-manager
- Install NGINX ingress controller
- Setup domain A record
- Create a GitHub App and configure GitHub App authenthication
- Prepare Actions Runner Controller configuration
- Install Actions Runner Controller
- Deploy runner manifest
- Verify deployment of all cluster services
- Verify status of runners and pods
- Resources
- An Azure subscription
- GitHub Enterprise Server 3.3 or GitHub Enterprise Cloud
- A top-level domain name (In this guide the example subdomain webhook.tld.com will be used)
# Install Azure CLI - https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
az login
# Install kubectl - https://docs.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az_aks_install_cli
az aks install-cli
# Create resource group
az group create -n <your-resource-group> --location <your-location>
# Create AKS cluster
az aks create -n <your-cluster-name> -g <your-resource-group> --node-resource-group <your-node-resource-group-name> --enable-managed-identity
# Get AKS access credentials
az aks get-credentials -n <your-cluster-name> -g <your-resource-group>
# Install Helm - https://helm.sh/docs/intro/install/
brew install helm # macOS
choco install kubernetes-helm # Windows
sudo snap install helm --classic # Debian/Ubuntu
# Add repositories
helm repo add jetstack https://charts.jetstack.io
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
# Update repositories
helm repo update
# Install cert-manager - https://cert-manager.io/docs/installation/helm/
helm install --wait --create-namespace --namespace cert-manager cert-manager jetstack/cert-manager --version v1.6.1 --set installCRDs=true
kubectl apply -f clusterissuer.yaml
email:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
namespace: cert-manager
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: your-email@address.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-prod
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx
# Install NGINX Ingress controller
helm install ingress-nginx ingress-nginx/ingress-nginx --namespace actions-runner-system --create-namespace
# Retrieve public load balancer IP from ingress controller
kubectl -n actions-runner-system get svc
Navigate to your domain registrar and create a new A record linking the above ingress load balancer IP to your TLD as a subdomain. e.g. webhook.tld.com
- Activate the GitHub App webhook feature and add your earlier created domain A record as a Webhook URL
- Navigate to permissions & events and enable webhook workflow job events
Prepare a webhook secret for use in the values.yaml file github_webhook_secret_token
and configure the same webhook secret in the created GitHub App
# Generate random webhook secret
ruby -rsecurerandom -e 'puts SecureRandom.hex(20)'
Modify the default values.yaml with your custom values like specified below
# Configure values.yaml
vim values.yaml
githubEnterpriseServerURL:
only needed when using GHESauthSecret:
githubWebhookServer:
ingress:
github_webhook_secret_token
# The URL of your GitHub Enterprise server, if you're using one.
githubEnterpriseServerURL: https://github.example.com
# Only 1 authentication method can be deployed at a time
# Uncomment the configuration you are applying and fill in the details
authSecret:
create: true
name: "controller-manager"
annotations: {}
### GitHub Apps Configuration
## NOTE: IDs MUST be strings, use quotes
github_app_id: "3"
github_app_installation_id: "1"
github_app_private_key: |-
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEA2zl6z+uMcS4D+D9f1ENLJY2w/9lLPajs/wA2gnt74/7bcB1f
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
2x/9kVAWKQ2UJGxqupGqV14vLaNpmA2uILBxc5jKXHu1nNkgUwU=
-----END RSA PRIVATE KEY-----
### GitHub PAT Configuration
#github_token: ""
githubWebhookServer:
enabled: true
replicaCount: 1
syncPeriod: 10m
secret:
create: false
name: "github-webhook-server"
### GitHub Webhook Configuration
github_webhook_secret_token: ""
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
# Specifies whether a service account should be created
create: true
# Annotations to add to the service account
annotations: {}
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name: ""
podAnnotations: {}
podLabels: {}
podSecurityContext: {}
# fsGroup: 2000
securityContext: {}
resources: {}
nodeSelector: {}
tolerations: []
affinity: {}
priorityClassName: ""
service:
type: ClusterIP
annotations: {}
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
#nodePort: someFixedPortForUseWithTerraformCdkCfnEtc
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: webhook.tld.com
paths:
- path: /
tls:
- secretName: letsencrypt-prod
hosts:
- webhook.tld.com
# Install actions-runner-controller
helm upgrade --install -f values.yaml --wait --namespace actions-runner-system actions-runner-controller actions-runner-controller/actions-runner-controller
# View all namespace resources
kubectl --namespace actions-runner-system get all
# Verify certificaterequest status
kubectl get certificaterequest --namespace actions-runner-system
# Verify certificate status
kubectl describe certificate letsencrypt --namespace actions-runner-system
# Verify if SSL certificate is working properly
curl -v --connect-to webhook.tld.com https://webhook.tld.com
# Create a new namespace
kubectl create namespace self-hosted-runners
# Edit runnerdeployment yaml
vim runnerdeployment.yaml
# Apply runnerdeployment manifest
kubectl apply -f runnerdeployment.yaml
The below manifest deploys organization-level auto-scaling ephemeral runners, using a minimal keep-alive configuration of 1 runner. Runners are scaled up to 5 active replicas based on incoming workflow_job webhook events. Scaling them back down to 1 runner by idle timeout of 5 minutes
organization:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: org-runner
namespace: self-hosted-runners
spec:
template:
metadata:
labels:
app: org-runner
spec:
organization: your-github-organization
labels:
- self-hosted
ephemeral: true
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: org-runner
namespace: self-hosted-runners
spec:
scaleTargetRef:
name: org-runner
scaleUpTriggers:
- githubEvent: {}
amount: 1
duration: "5m"
minReplicas: 1
maxReplicas: 5
# List running pods
kubectl get pods -n self-hosted-runners
# List active runners
kubectl get runners -n self-hosted-runners
kubectl get all -A