Fail to write to /var/post_install/kustomization.yaml #1154

otavio · 2024-01-05T13:39:06Z

otavio
Jan 5, 2024

          It seems that something is wrong. Please see the details:

module.infra.null_resource.kustomization (remote-exec): Connecting to remote host via SSH...
module.infra.null_resource.kustomization (remote-exec):   Host: xxxx
module.infra.null_resource.kustomization (remote-exec):   User: root
module.infra.null_resource.kustomization (remote-exec):   Password: false
module.infra.null_resource.kustomization (remote-exec):   Private key: false
module.infra.null_resource.kustomization (remote-exec):   Certificate: false
module.infra.null_resource.kustomization (remote-exec):   SSH Agent: true
module.infra.null_resource.kustomization (remote-exec):   Checking Host Key: false
module.infra.null_resource.kustomization (remote-exec):   Target Platform: unix
module.infra.null_resource.kustomization (remote-exec): Connected!
module.infra.null_resource.kustomization: Still creating... [40s elapsed]
module.infra.null_resource.kustomization (remote-exec): + sed -i 's/^- |[0-9]\+$/- |/g' /var/post_install/kustomization.yaml
module.infra.null_resource.kustomization (remote-exec): sed: can't read /var/post_install/kustomization.yaml: Not a directory
╷
│ Error: remote-exec provisioner error
│
│   with module.infra.null_resource.kustomization,
│   on .terraform/modules/infra/init.tf line 288, in resource "null_resource" "kustomization":
│  288:   provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_148135017.sh": Process exited with status 2
╵
% ssh root@xxxx -p 60022
infra-control-plane-vnc:~ # less /var/post_install
infra-control-plane-vnc:~ # ls -l /var/post_install
-rw-r--r--. 1 root root 642 Jan  2 18:16 /var/post_install
infra-control-plane-vnc:~ #

The file /var/post_install should have been a directory instead.

Originally posted by @otavio in #1135 (comment)

Answered by Dullaz

Jan 6, 2024

Just ran into this issue

TL;DR

I don't know what causes this, but to reproduce you can try adding additional control planes or renaming the one control plane you do have.

Fix is to add "mkdir -p /var/post_install /var/user_kustomize", to line 177 in .terraform/modules/kube-hetzner/control_planes.tf

The investigation

ssh'ed into the control plane and ran script that failed manually

Error output (that is hidden normally because secrets):

curl https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.6.0/deploy/kubernetes/hcloud-csi.yml -o /var/post_install/hcloud-csi.yml
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload…

View full answer

otavio · 2024-01-05T15:00:08Z

otavio
Jan 5, 2024
Author

The sed is at:

terraform-hcloud-kube-hetzner/init.tf

Lines 292 to 300 in b6882d1

    
           # This ugly hack is here, because terraform serializes the 
        
           # embedded yaml files with "- |2", when there is more than 
        
           # one yamldocument in the embedded file. Kustomize does not understand 
        
           # that syntax and tries to parse the blocks content as a file, resulting 
        
           # in weird errors. so gnu sed with funny escaping is used to 
        
           # replace lines like "- |3" by "- |" (yaml block syntax). 
        
           # due to indendation this should not changes the embedded 
        
           # manifests themselves 
        
           "sed -i 's/^- |[0-9]\\+$/- |/g' /var/post_install/kustomization.yaml",

from:

terraform-hcloud-kube-hetzner/init.tf

Lines 150 to 154 in b6882d1

    
           # Upload kustomization.yaml, containing Hetzner CSI & CSM, as well as kured. 
        
           provisioner "file" { 
        
             content     = local.kustomization_backup_yaml 
        
             destination = "/var/post_install/kustomization.yaml" 
        
           }

It might be due the create_kustomization = false in kube.tf

0 replies

mysticaltech · 2024-01-05T18:04:33Z

mysticaltech
Jan 5, 2024
Maintainer

@otavio Thanks for this, but how to reproduce?

0 replies

otavio · 2024-01-05T19:05:22Z

otavio
Jan 5, 2024
Author

I have created a node, and it failed.

0 replies

mysticaltech · 2024-01-05T22:31:15Z

mysticaltech
Jan 5, 2024
Maintainer

@otavio I'm sorry I cannot reproduce, for me it works, even with create_kustomization = false.

0 replies

mysticaltech · 2024-01-05T22:32:26Z

mysticaltech
Jan 5, 2024
Maintainer

So my conclusion is that you are probably on Windows, and something is up with WSL. Please try from Mac or Linux to confirm. Moving this to a discussion, please don't hesitate to tag me again in the discussion.

0 replies

carstenblt · 2024-01-05T23:01:08Z

carstenblt
Jan 5, 2024

@mysticaltech I am having the same problem with MacOS. I ended up manually fixing the file structure and then rerunning terraform. If you can't reproduce it from a fresh install this might occur when modifying an existing cluster, maybe in a specific way. If it'll help I'll try my kube.tf - that threw the errors for sure - with a fresh hetzner project and see if it runs without errors.

0 replies

otavio · 2024-01-06T00:06:20Z

otavio
Jan 6, 2024
Author

I have been using the project on my Linux setup for a while. During the last upgrade, we also refreshed the MicroOS snapshot. The hcloud-microos-snapshots.pkr.hcl file is the one from the 2.11.3 release.

Below is the complete kube.tf:

module "infra" {
  providers = {
    hcloud = hcloud
  }

  hcloud_token               = data.sops_file.secrets.data["hetzner-cloud-auth-token"]
  ssh_public_key             = file("../../secrets/keys/users/otavio.pub")
  ssh_additional_public_keys = [file("../../secrets/keys/users/victor.pub")]
  ssh_private_key            = null
  ssh_port                   = 60022

  source  = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.11.3"

  # Extra k3s registries.
  k3s_registries = <<-EOT
    mirrors:
      docker.io:
        endpoint:
          - "https://mirror.gcr.io"
          - "https://registry-1.docker.io"
  EOT

  # Avoid updating it as Longhorn need check for compatibility.
  initial_k3s_channel = "v1.27"

  network_region       = "us-east"
  create_kubeconfig    = false
  create_kustomization = false

  # Use Cilium as network layer.
  cni_plugin = "cilium"
  cilium_version = "v1.14.5"
  cilium_routing_mode = "native"
  cilium_egress_gateway_enabled = false
  cilium_ipv4_native_routing_cidr = "10.0.0.0/8"

  # Block ICMP PING in nodes
  block_icmp_ping_in = true

  dns_servers = [
    "1.1.1.1",
    "8.8.8.8",
    "2606:4700:4700::1111",
  ]

  extra_firewall_rules = [
    {
      direction       = "out"
      protocol        = "tcp"
      port            = "22"
      source_ips      = []
      destination_ips = ["0.0.0.0/0", "::/0"]
      description     = "Allow FluxCD access to resources via SSH"
    },
    {
      direction       = "out"
      protocol        = "tcp"
      port            = "587"
      source_ips      = []
      destination_ips = ["0.0.0.0/0", "::/0"]
      description     = "Allow SMTP access so we can send e-mails from pods"
    },
    {
      direction       = "in"
      protocol        = "udp"
      port            = "9993"
      source_ips      = ["0.0.0.0/0", "::/0"]
      destination_ips = []
      description     = "Allow ZeroTier access"
    },
    {
      direction       = "out"
      protocol        = "udp"
      port            = "9993"
      source_ips      = []
      destination_ips = ["0.0.0.0/0", "::/0"]
      description     = "Allow ZeroTier access"
    },
    {
      direction       = "in"
      protocol        = "udp"
      port            = "5683"
      source_ips      = ["0.0.0.0/0", "::/0"]
      destination_ips = []
      description     = "Allow CoAP access"
    }
  ]

  use_control_plane_lb = true
  control_plane_nodepools = [
    {
      name        = "control-plane",
      server_type = "cpx31",
      location    = "ash",
      labels      = ["role=infra"],
      taints      = [],
      count       = 3
    }
  ]

  agent_nodepools = [
    {
      name        = "agent",
      server_type = "cpx41",
      location    = "ash",
      labels      = ["role=apps", "node.longhorn.io/create-default-disk=true"],
      taints      = [],
      count       = 9
    },
  ]

  kured_options = {
    "reboot-days" : "su"
    "start-time" : "9am"
    "end-time" : "5pm"
  }

  base_domain = "nodes.xxx"

  load_balancer_type     = "lb11"
  load_balancer_location = "ash"

  # We deploy the ingress ourselves.
  ingress_controller = "none"

  enable_longhorn     = true
  longhorn_namespace  = "storage"
  longhorn_values     = <<EOT
defaultSettings:
  createDefaultDiskLabeledNodes: true
  backupTargetCredentialSecret: longhorn-cloud-credentials
  backupTarget: s3://lab-infra-backup@us-central-1/longhorn/
  priorityClass: system-node-critical
  storageOverProvisioningPercentage: 100
  replicaSoftAntiAffinity: false
  allowVolumeCreationWithDegradedAvailability: false
  nodeDrainPolicy: allow-if-replica-is-stopped
persistence:
  reclaimPolicy: Retain
  recurringJobSelector:
    enable: true
    jobList: '[{"name":"infrequent-backups", "isGroup":true}]'
EOT
  disable_hetzner_csi = true

  cluster_name = "infra"

  use_cluster_name_in_node_name = true
}

One thing that I recall now is it happened when changing settings so it might fail when
it needs to redo the kustomization.

Maybe this helps to find the root cause.

0 replies

mysticaltech · 2024-01-06T03:32:33Z

mysticaltech
Jan 6, 2024
Maintainer

@otavio @carstenblt Thanks for sharing more details folks. I will investigate more.

1 reply

mysticaltech Jan 6, 2024
Maintainer

I tried, after deploy, modified some values, it replaced the kustomization and it went through without problems.

So please try again folks with the latest v2.11.5, lots of fixes went it today.

Dullaz · 2024-01-06T16:13:43Z

Dullaz
Jan 6, 2024

Just ran into this issue

TL;DR

I don't know what causes this, but to reproduce you can try adding additional control planes or renaming the one control plane you do have.

Fix is to add "mkdir -p /var/post_install /var/user_kustomize", to line 177 in .terraform/modules/kube-hetzner/control_planes.tf

The investigation

ssh'ed into the control plane and ran script that failed manually

Error output (that is hidden normally because secrets):

curl https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.6.0/deploy/kubernetes/hcloud-csi.yml -o /var/post_install/hcloud-csi.yml
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0Warning: Failed to open the file /var/post_install/hcloud-csi.yml: Not a
Warning: directory
  0 11366    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (23) Failure writing output to destination

looks like /var/post_install/ doesn't exist as a directory

checking my control plane, I see that /var/post_install is actually a file with kured chart inside

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kured
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: kured
  template:
    metadata:
      labels:
        name: kured
....

I commented out the kured provisioner, but the same error occurred and now post_install was the rancher.yaml file, so cause was probably not the provisioners but I didn't know why they were acting like this.

Did a bit more digging and it seems related to hashicorp/terraform#16330. If a folder in the destination path doesn't exist, then the behaviour of provisioner file is to dump the contents in a file named the same as the folder, i.e. post_install

So the folder doesn't exist when the file provisioners run and they just overwrite each other.

Then I echoed the output of the mkdir command that's supposed to create the directory and.....nothing changed because i was editing "first_control_plane" and not "control_planes"

Added the mkdir command to .terraform/modules/kube-hetzner/control_planes.tf#177 and no more errors

Looks like the folder is only created on first_control_plane and not additional control planes.

@mysticaltech I don't have enough context to say if this is a workaround or actually the solution. Happy to raise a PR if this is a good fix

5 replies

mysticaltech Jan 7, 2024
Maintainer

@Dullaz Superb debugging, I will use that to make a fix. Thank you so much.

mysticaltech Jan 7, 2024
Maintainer

This was indeed needed. As the first mkdir ran only for the first control-plane. It needed to be run for other control planes too.

mysticaltech Jan 7, 2024
Maintainer

Next time, please don't hesitate to shoot a PR directly 🙏

Dullaz Jan 7, 2024

Wohoo! Glad it's resolved :D

Thank you for creating this btw, it's really excellent. I don't know what makes your documentation so good but it's so good

mysticaltech Jan 8, 2024
Maintainer

Thanks @Dullaz, we just try to explain every bit as they come. 🙏🙏

mysticaltech · 2024-01-07T07:00:25Z

mysticaltech
Jan 7, 2024
Maintainer

Fixed in v2.11.6

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to write to /var/post_install/kustomization.yaml #1154

{{title}}

Replies: 10 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Fail to write to /var/post_install/kustomization.yaml #1154

otavio Jan 5, 2024

Replies: 10 comments · 6 replies

otavio Jan 5, 2024 Author

mysticaltech Jan 5, 2024 Maintainer

otavio Jan 5, 2024 Author

mysticaltech Jan 5, 2024 Maintainer

mysticaltech Jan 5, 2024 Maintainer

carstenblt Jan 5, 2024

otavio Jan 6, 2024 Author

mysticaltech Jan 6, 2024 Maintainer

mysticaltech Jan 6, 2024 Maintainer

Dullaz Jan 6, 2024

mysticaltech Jan 7, 2024 Maintainer

mysticaltech Jan 7, 2024 Maintainer

mysticaltech Jan 7, 2024 Maintainer

Dullaz Jan 7, 2024

mysticaltech Jan 8, 2024 Maintainer

mysticaltech Jan 7, 2024 Maintainer

otavio
Jan 5, 2024

Replies: 10 comments 6 replies

otavio
Jan 5, 2024
Author

mysticaltech
Jan 5, 2024
Maintainer

otavio
Jan 5, 2024
Author

mysticaltech
Jan 5, 2024
Maintainer

mysticaltech
Jan 5, 2024
Maintainer

carstenblt
Jan 5, 2024

otavio
Jan 6, 2024
Author

mysticaltech
Jan 6, 2024
Maintainer

mysticaltech Jan 6, 2024
Maintainer

Dullaz
Jan 6, 2024

mysticaltech Jan 7, 2024
Maintainer

mysticaltech Jan 7, 2024
Maintainer

mysticaltech Jan 7, 2024
Maintainer

mysticaltech Jan 8, 2024
Maintainer

mysticaltech
Jan 7, 2024
Maintainer