This is a Terraform module to create and manage an RKE2 cluster on Hetzner Cloud platform. At very minimum you will get out of the box:
- a highly available RKE2 cluster with three master nodes;
- a load balancer for the cluster's API and HTTP/HTTPS ingress traffic;
- Hetzner Cloud Controller Manager;
- Ingress NGINX Controller configured to work with the load balancer;
- cert-manager.
- Hetzner Cloud account and read/write API token;
- (optionally) Hetzner DNS API token with access to the cluster's DNS zone;
- Terraform or OpenTofu CLI client.
Create terraform.tfvars
containing at least the following
variable values.
domain = "mydomain.tld"
cluster_name = "mycluster"
hcloud_token = "hetzner-cloud-token"
Obviously, use your own values. You don't need to own the listed domain if you don't plan to provision DNS records (see below).
Initialize Terraform.
terraform init
Apply the configuration.
terraform apply
This will create an RKE2 cluster and output the information about the load balancer and node information.
For convenience, you can ask the configuration to store the SSH
private key, id_rsa_mycluster
, as well as Kubernetes configuration
file, config-mycluster.yaml
, in the current folder.
Note: mycluster in the name comes from cluster_name
variable in
the configuration.
write_config_files = true
Then you can access cluster's nodes using the following command.
ssh -l root -i id_rsa_mycluster <node IP>
Make sure the load balancer is healthy. You can access the cluster using Kubernetes CLI.
kubectl get nodes --kubeconfig=config-mycluster.yaml
Alternatively, you can extract the content of the files using
output
command.
terraform output -raw kubeconfig >~/.kube/config
terraform output -raw ssh_private_key >~/.ssh/id_rsa
You can create additional agent nodes in the cluster by specifying
agent_count
value. This value can be adjusted after the initial
cluster creation.
agent_count = 5
You can specify the server type and the image to use for the nodes as
well as the location where create the nodes. By default,
the configuration uses cax11
machines running ubuntu-22.04
image
at nbg1
Hetzner location.
location = "fsn1"
master_type = "cax21"
agent_type = "cax31"
image = "ubuntu-20.04"
If you own the DNS zone for the cluster and host it in Hetzner DNS,
you can provision A
and AAAA
wildcard records for the cluster's
load balancer.
hdns_token = "hetzner-dns-token"
This will create the following records:
*.mycluster.mydomain.tld. 300 A <load balancer's IPv4>
*.mycluster.mydomain.tld. 300 AAAA <load balancer's IPv6>
Having these records in place, you can access the cluster's Kubernetes API using https://api.mycluster.mydomain.tld:6443 URL. The Kubernetes configuration file produced by the configuration will use that instead of the IP address of the load balancer.
The applications hosted in the cluster and using ingress objects
to provide access to them, can use URLs similar to this one:
https://myapp.mycluster.mydomain.tld/. The certificate for the name can
be obtained using lets-encrypt
cluster issuer.
You can configure a Let's Encrypt cluster issuer by specifying this variable. You probably want this as it will be used to protect web UI URLs of the services listed below.
acme_email = "my.mail@mydomain.tld"
The configuration can deploy the following additional cluster services
by setting their corresponding variable values to true
.
use_hcloud_storage = true // use Hetzner Cloud CSI driver
use_longhorn = true // use Longhorn distributed block storage
use_headlamp = true // use Headlamp Kubernetes UI
If only Hetzner Cloud CSI driver is deployed, hcloud
storage class
becomes the default one for the cluster. If
Longhorn is deployed, by itself or in addition
to Hetzner Cloud CSI driver, then longhorn
storage class becomes
the default.
Longhorn UI will be available at
https://longhorn.mycluster.mydomain.tld/ protected by Basic
authentication is you provide a password for it. The username is
longhorn
.
longhorn_password = "L0nGHo7n"
It is essential to configure a backup target to be used by Longhorn in production deployments. The configuration supports S3 target. Use the following variables to configure it.
longhorn_backup_target = "s3://mycluster@us-east/"
longhorn_aws_endpoints = "https://s3.provider.tld"
longhorn_aws_access_key_id = "accessKeyId"
longhorn_aws_secret_access_key = "SecretAccessKey"
Providing longhorn_aws_endpoints
is optional.
Headlamp UI will be available at https://headlamp.mycluster.mydomain.tld/. You can get the authentication token for it by running:
kubectl create token headlamp -n headlamp \
--kubeconfig=config-mycluster.yaml
You can control what versions of software to deploy by setting these variables.
rke2_version = "v1.27.11+rke2r1"
hcloud_ccm_version = "1.19.0"
hcloud_csi_version = "2.6.0"
cert_manager_version = "v1.14.4"
longhorn_version = "1.5.4"
The version of Ingress NGINX Controller is controlled by the RKE2 version (see RKE2 Release Notes).
You can reboot or power down any individual node in the cluster. Here is the procedure.
- Obtain the information about nodes in the cluster and find the
node you want to reboot. For example:
mycluster-agent-9wsi3q
.kubectl get nodes --kubeconfig=config-mycluster.yaml
- Drain the node.
Wait for the command to finish.
kubectl drain --ignore-daemonsets mycluster-agent-9wsi3q \ --kubeconfig=config-mycluster.yaml
- Power down or reboot the node.
- Once the server is back up, mark the node as usable.
kubectl uncordon mycluster-agent-9wsi3q \ --kubeconfig=config-mycluster.yaml
You can rebuild any individual node in the cluster be that an agent or
a master node (see special procedure for master[0]
below). A new
node is created first and then the existing node is destroyed.
The procedure follows.
- Obtain the information about nodes in the cluster and find the
node you want to rebuild. Note the type of the node (master vs agent).
For example:
mycluster-agent-9wsi3q
.For a master node usekubectl get nodes --kubeconfig=config-mycluster.yaml terraform output agent
output master
. Calculate the zero-based index of the node in the list. For example:2
. - Drain the node.
Wait for the command to finish.
kubectl drain --ignore-daemonsets --delete-emptydir-data \ mycluster-agent-9wsi3q \ --kubeconfig=config-mycluster.yaml
- Replace the name suffix.
This will replace the node as described above. For master nodes use
terraform apply -replace 'module.cluster.random_string.agent[2]'
module.cluster.random_string.master
instances. Monitor the cluster to ensure the workloads are stable before proceeding to replace another node.
Important: Because master[0]
node is used to retrieve the cluster's
configuration file, and the configuration is needed to read the cluster's
resources, an attempt to replace the node using the procedure outlined
above creates a failure during the planning phase. In order to execute
the node replacement cleanly, the third step needs to be done in two parts.
First, replace the node but avoid propagating changes to the cluster's
configuration to the providers that use the it.
terraform apply -replace 'module.cluster.random_string.master[0]' \
-target terraform_data.kubernetes
Then finish the work by applying the remaining changes. This will destroy the original node.
terraform apply -replace 'module.cluster.random_string.agent[2]'
If you are just playing with the setup, or setting up some experiments, and need to remove the cluster cleanly, you can run the following command. ALL YOUR DATA IN THE CLUSTER WILL BE LOST!
terraform destroy
The original code in this repository comes from Sven Mattsen, https://github.com/scm2342/rke2-build-hetzner. Further development was influenced by ideas picked up from Felix Wenzel, https://github.com/wenzel-felix/terraform-hcloud-rke2.
tofu apply -replace 'module.cluster.random_string.master[0]' -target module.cluster
tofu apply -target terraform_data.client_certificate
tofu apply -target terraform_data.client_key