-
The file Originally posted by @otavio in #1135 (comment) |
Beta Was this translation helpful? Give feedback.
Replies: 10 comments 6 replies
-
The terraform-hcloud-kube-hetzner/init.tf Lines 292 to 300 in b6882d1 from: terraform-hcloud-kube-hetzner/init.tf Lines 150 to 154 in b6882d1 It might be due the |
Beta Was this translation helpful? Give feedback.
-
@otavio Thanks for this, but how to reproduce? |
Beta Was this translation helpful? Give feedback.
-
I have created a node, and it failed. |
Beta Was this translation helpful? Give feedback.
-
@otavio I'm sorry I cannot reproduce, for me it works, even with |
Beta Was this translation helpful? Give feedback.
-
So my conclusion is that you are probably on Windows, and something is up with WSL. Please try from Mac or Linux to confirm. Moving this to a discussion, please don't hesitate to tag me again in the discussion. |
Beta Was this translation helpful? Give feedback.
-
@mysticaltech I am having the same problem with MacOS. I ended up manually fixing the file structure and then rerunning terraform. If you can't reproduce it from a fresh install this might occur when modifying an existing cluster, maybe in a specific way. If it'll help I'll try my kube.tf - that threw the errors for sure - with a fresh hetzner project and see if it runs without errors. |
Beta Was this translation helpful? Give feedback.
-
I have been using the project on my Linux setup for a while. During the last upgrade, we also refreshed the MicroOS snapshot. The Below is the complete module "infra" {
providers = {
hcloud = hcloud
}
hcloud_token = data.sops_file.secrets.data["hetzner-cloud-auth-token"]
ssh_public_key = file("../../secrets/keys/users/otavio.pub")
ssh_additional_public_keys = [file("../../secrets/keys/users/victor.pub")]
ssh_private_key = null
ssh_port = 60022
source = "kube-hetzner/kube-hetzner/hcloud"
version = "2.11.3"
# Extra k3s registries.
k3s_registries = <<-EOT
mirrors:
docker.io:
endpoint:
- "https://mirror.gcr.io"
- "https://registry-1.docker.io"
EOT
# Avoid updating it as Longhorn need check for compatibility.
initial_k3s_channel = "v1.27"
network_region = "us-east"
create_kubeconfig = false
create_kustomization = false
# Use Cilium as network layer.
cni_plugin = "cilium"
cilium_version = "v1.14.5"
cilium_routing_mode = "native"
cilium_egress_gateway_enabled = false
cilium_ipv4_native_routing_cidr = "10.0.0.0/8"
# Block ICMP PING in nodes
block_icmp_ping_in = true
dns_servers = [
"1.1.1.1",
"8.8.8.8",
"2606:4700:4700::1111",
]
extra_firewall_rules = [
{
direction = "out"
protocol = "tcp"
port = "22"
source_ips = []
destination_ips = ["0.0.0.0/0", "::/0"]
description = "Allow FluxCD access to resources via SSH"
},
{
direction = "out"
protocol = "tcp"
port = "587"
source_ips = []
destination_ips = ["0.0.0.0/0", "::/0"]
description = "Allow SMTP access so we can send e-mails from pods"
},
{
direction = "in"
protocol = "udp"
port = "9993"
source_ips = ["0.0.0.0/0", "::/0"]
destination_ips = []
description = "Allow ZeroTier access"
},
{
direction = "out"
protocol = "udp"
port = "9993"
source_ips = []
destination_ips = ["0.0.0.0/0", "::/0"]
description = "Allow ZeroTier access"
},
{
direction = "in"
protocol = "udp"
port = "5683"
source_ips = ["0.0.0.0/0", "::/0"]
destination_ips = []
description = "Allow CoAP access"
}
]
use_control_plane_lb = true
control_plane_nodepools = [
{
name = "control-plane",
server_type = "cpx31",
location = "ash",
labels = ["role=infra"],
taints = [],
count = 3
}
]
agent_nodepools = [
{
name = "agent",
server_type = "cpx41",
location = "ash",
labels = ["role=apps", "node.longhorn.io/create-default-disk=true"],
taints = [],
count = 9
},
]
kured_options = {
"reboot-days" : "su"
"start-time" : "9am"
"end-time" : "5pm"
}
base_domain = "nodes.xxx"
load_balancer_type = "lb11"
load_balancer_location = "ash"
# We deploy the ingress ourselves.
ingress_controller = "none"
enable_longhorn = true
longhorn_namespace = "storage"
longhorn_values = <<EOT
defaultSettings:
createDefaultDiskLabeledNodes: true
backupTargetCredentialSecret: longhorn-cloud-credentials
backupTarget: s3://lab-infra-backup@us-central-1/longhorn/
priorityClass: system-node-critical
storageOverProvisioningPercentage: 100
replicaSoftAntiAffinity: false
allowVolumeCreationWithDegradedAvailability: false
nodeDrainPolicy: allow-if-replica-is-stopped
persistence:
reclaimPolicy: Retain
recurringJobSelector:
enable: true
jobList: '[{"name":"infrequent-backups", "isGroup":true}]'
EOT
disable_hetzner_csi = true
cluster_name = "infra"
use_cluster_name_in_node_name = true
}
One thing that I recall now is it happened when changing settings so it might fail when Maybe this helps to find the root cause. |
Beta Was this translation helpful? Give feedback.
-
@otavio @carstenblt Thanks for sharing more details folks. I will investigate more. |
Beta Was this translation helpful? Give feedback.
-
Just ran into this issue TL;DR I don't know what causes this, but to reproduce you can try adding additional control planes or renaming the one control plane you do have. Fix is to add The investigation ssh'ed into the control plane and ran script that failed manually Error output (that is hidden normally because secrets): curl https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.6.0/deploy/kubernetes/hcloud-csi.yml -o /var/post_install/hcloud-csi.yml
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Warning: Failed to open the file /var/post_install/hcloud-csi.yml: Not a
Warning: directory
0 11366 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (23) Failure writing output to destination looks like checking my control plane, I see that
I commented out the Did a bit more digging and it seems related to hashicorp/terraform#16330. If a folder in the destination path doesn't exist, then the behaviour of provisioner file is to dump the contents in a file named the same as the folder, i.e. So the folder doesn't exist when the file provisioners run and they just overwrite each other. Then I echoed the output of the mkdir command that's supposed to create the directory and.....nothing changed because i was editing "first_control_plane" and not "control_planes" Added the mkdir command to Looks like the folder is only created on first_control_plane and not additional control planes. @mysticaltech I don't have enough context to say if this is a workaround or actually the solution. Happy to raise a PR if this is a good fix |
Beta Was this translation helpful? Give feedback.
-
Fixed in v2.11.6 |
Beta Was this translation helpful? Give feedback.
Just ran into this issue
TL;DR
I don't know what causes this, but to reproduce you can try adding additional control planes or renaming the one control plane you do have.
Fix is to add
"mkdir -p /var/post_install /var/user_kustomize",
to line 177 in.terraform/modules/kube-hetzner/control_planes.tf
The investigation
ssh'ed into the control plane and ran script that failed manually
Error output (that is hidden normally because secrets):