Here is autoscaler and node drainer for Azure. The idea is simple: you can use autoscale rules on CPU or another metrics inside VM scaleset. Azure uses Cloud-init for VM provisioning, so, you can add nodes to a Swarm cluster automatically with Custom data and cloud init
#cloud-config
apt:
sources:
docker:
source: "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
keyid: 9DC858229FC7DD38854AE2D88D81803C0EBFCD88
package_upgrade: true
packages:
- docker-ce
- docker-ce-cli
- containerd.io
runcmd:
- sudo usermod -aG docker swarm
- docker swarm join --token SWMTKN-1-1n7dilf18jfyefv6f60n0ddnmavxoyq3ue8flfb4k2gfpj5fv7-5rohtvmhac6bf8dfpc3do481w 10.1.0.4:2377
The scaler have two functions:
- "Autoscale" replicas when cluster got more nodes
- Drain and delete nodes when cluster scales down
Because of various best practices and ways to treat your microservices in Swarm I choose a simple way to achieve "autoscaling", it's merely config.yaml where you can define how much replicas per node you want to have. Inside an example below I have service "docs" with two replicas per node. So, when your cluster is going to scale up - scaler increase how much replicas do you have.
services:
docs: 2
When your cluster is going to scale down scaler recieve a message from API drain node and remove it from cluster. Take a look to sources when you have concern about graceful shutdown timing. By default it is 30 seconds.
- Azure has its own rules about hostnames and vm names. I expect your VMSS will have name like
swarm-xxx
where xxx could be a region or etc. - You should change trigger event in sources if you would like to use SpotVM.
- By default config file location is
/home/config.yaml
- Sheduled Events in Azure should be enabled for VMSS.
- Scaler do nothing when you manually reboot machines or some of it is down.
docker service create --mode global --constraint node.role==manager --mount src=/var/run/docker.sock,dst=/var/run/docker.sock,type=bind --name scaler --config source=config.yaml,target=/home/config.yaml codeandmedia/swarm-azure-scaler:latest
You may use my image for test purposes, but I highly recommend to customize image and sources for your cluster.
- Create VM with white IP and VNET for the cluster, SSH to it and initialize
docker swarm init
- Add swarm join string to cloud-init and create two VMSS with 1 VM each in regions like Germany West Central and Switzerland North. Scale-in policy should be Newest-VM. Do not forget to enable SheduledEvents.
- Setup your autoscale rules follow the Docs
- Create Basic Load Balancer for each ScaleSet and setup NSG, open ports related to your apps. Basic Load Balancers are free in Azure.
- Promote each first VM inside ScaleSat to managers.
- Create config map for the scaler and deploy it.
- Create and config your services.
You can use the VM with white IP to SSH to machines inside VMSS if you need.