-
Notifications
You must be signed in to change notification settings - Fork 99
Description
Versions
k0sctl version: github.com/k0sproject/k0sctl v0.26.1-0.20251016074538-d8718bed3a0b
k0s version: v1.32.6+k0s.0
Context
Must be a regression from #904 and #892
k0sctl is used as a vendored dependency in Go code.
What happened
After redeploying a k0s cluster on the nodes where it was previously deployed and reset with k0sctl reset, I am unable to get the logs of certain pods:
% k logs -n mke mke-operator-controller-manager-67b5c65c9-j6pfr
Error from server: Get "https://172.31.0.191:10250/containerLogs/mke/mke-operator-controller-manager-67b5c65c9-j6pfr/manager": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes-ca")
I have 2 nodes in my cluster: 1 controller+worker and 1 worker. Both nodes use a custom kubelet dir /var/lib/kubelet.
The error above only appears for pods that are running on the controller node. Pods from the worker node show logs just fine.
After investigating, I found that the kubelet server cert doesn't match the cluster CA:
# openssl verify -CAfile /var/lib/k0s/pki/ca.crt /var/lib/kubelet/pki/kubelet-server-current.pem
O = system:nodes, CN = system:node:ip-172-31-0-73.ec2.internal
error 30 at 0 depth lookup: authority and subject key identifier mismatch
error /var/lib/kubelet/pki/kubelet-server-current.pem: verification failed
The same check on the worker node passes
# openssl verify -CAfile /var/lib/k0s/pki/ca.crt /var/lib/kubelet/pki/kubelet-server-current.pem
/var/lib/kubelet/pki/kubelet-server-current.pem: OK
After inspecting the kubelet PKI dir, I found that on the controller+worker node, the kubelet server cert was from the previous k0s installation:
# ls -al /var/lib/kubelet/pki/
total 28
drwxr-xr-x 2 root root 4096 Dec 2 18:35 .
drwxr-xr-x 9 root root 4096 Nov 21 02:58 ..
-rw------- 1 root root 1143 Nov 21 02:58 kubelet-client-2025-11-21-02-58-35.pem
-rw------- 1 root root 1143 Nov 22 03:02 kubelet-client-2025-11-22-03-02-38.pem
-rw------- 1 root root 1143 Dec 2 16:56 kubelet-client-2025-12-02-16-56-03.pem
-rw------- 1 root root 1143 Dec 2 18:35 kubelet-client-2025-12-02-18-35-41.pem
lrwxrwxrwx 1 root root 59 Dec 2 18:35 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2025-12-02-18-35-41.pem
-rw------- 1 root root 1208 Nov 21 02:58 kubelet-server-2025-11-21-02-58-40.pem
lrwxrwxrwx 1 root root 59 Nov 21 02:58 kubelet-server-current.pem -> /var/lib/kubelet/pki/kubelet-server-2025-11-21-02-58-40.pem
Here, the client cert was reissued and the symlink was updated, but the server cert was reused from the old deployment.
On the worker node the same directory looks like this:
# ls -al /var/lib/kubelet/pki/
total 16
drwxr-xr-x 2 root root 4096 Dec 2 16:56 .
drwxr-xr-x 9 root root 4096 Dec 2 16:56 ..
-rw------- 1 root root 1143 Dec 2 16:56 kubelet-client-2025-12-02-16-56-05.pem
lrwxrwxrwx 1 root root 59 Dec 2 16:56 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2025-12-02-16-56-05.pem
-rw------- 1 root root 1208 Dec 2 16:56 kubelet-server-2025-12-02-16-56-13.pem
lrwxrwxrwx 1 root root 59 Dec 2 16:56 kubelet-server-current.pem -> /var/lib/kubelet/pki/kubelet-server-2025-12-02-16-56-13.pem
Somehow, k0sctl reset cleaned up the custom kubelet dir on the worker node, but not on the controller+worker node. As a result, the following installation of k0s ended up reusing the remaining kubelet server certs on the controller+worker node.
The systemd units on both nodes include the custom kubelet dir flag
k0scontroller service
[Unit]
Description=k0s - Zero Friction Kubernetes
Documentation=https://docs.k0sproject.io
ConditionFileIsExecutable=/usr/local/bin/k0s
After=network-online.target
Wants=network-online.target
[Service]
StartLimitInterval=5
StartLimitBurst=10
ExecStart=/usr/local/bin/k0s controller --config=/etc/k0s/k0s.yaml --data-dir=/var/lib/k0s --debug=true --disable-components=konnectivity-server,endpoint-reconciler --enable-metrics-scraper=true --enable-worker=true --kubelet-extra-args=--node-ip=172.31.0.73 --kubelet-root-dir=/var/lib/kubelet --labels=mke/version=dev --profile=mke-default-manager
RestartSec=10
Delegate=yes
KillMode=process
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
LimitNOFILE=999999
Restart=always
[Install]
WantedBy=multi-user.target
k0sworker service
[Unit]
cat: /etc/systemd/system/k0scontroller.service: No such file or directory
[Unit]: command not found
root@ip-172-31-0-48:/home/ubuntu# cat /etc/systemd/system/k0sworker.service
[Unit]
Description=k0s - Zero Friction Kubernetes
Documentation=https://docs.k0sproject.io
ConditionFileIsExecutable=/usr/local/bin/k0s
After=network-online.target
Wants=network-online.target
[Service]
StartLimitInterval=5
StartLimitBurst=10
ExecStart=/usr/local/bin/k0s worker --data-dir=/var/lib/k0s --debug=true --kubelet-extra-args=--node-ip=172.31.0.48 --kubelet-root-dir=/var/lib/kubelet --profile=mke-default-worker --token-file=/etc/k0s/k0stoken
RestartSec=10
Delegate=yes
KillMode=process
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
LimitNOFILE=999999
Restart=always
[Install]
WantedBy=multi-user.target
Steps to reproduce
- Deploy k0s cluster with 1 controller+worker and 1 worker node. Set
--kubelet-root-dir=/var/lib/kubeletfor each node.
Use vendred k0sctl in Go code as shown in #904
Example k0sctl cluster config
apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
name: mke
user: ""
spec:
hosts:
- ssh:
address: 3.237.0.80
user: ubuntu
port: 22
keyPath: /Users/dshishliannikov/mirantis/mke/deployments/mke3/ssh_keys/mke3.pem
role: controller+worker
installFlags:
- --kubelet-root-dir=/var/lib/kubelet
- --data-dir=/var/lib/k0s
- --debug=true
- --enable-metrics-scraper=true
- --disable-components=konnectivity-server,endpoint-reconciler
- --labels=mke/version=dev
- --profile=mke-default-manager
- ssh:
address: 3.227.235.72
user: ubuntu
port: 22
keyPath: /Users/dshishliannikov/mirantis/mke/deployments/mke3/ssh_keys/mke3.pem
role: worker
installFlags:
- --debug=true
- --kubelet-root-dir=/var/lib/kubelet
- --data-dir=/var/lib/k0s
- --profile=mke-default-worker
k0s:
version: v1.32.6+k0s.0
config:
apiVersion: k0s.k0sproject.io/v1beta1
kind: Cluster
metadata:
name: mke
spec:
api:
externalAddress: 80xa1p-mke4-lb-6972b00c98d1bfe2.elb.us-east-1.amazonaws.com
extraArgs:
authentication-config: /var/lib/k0s/oidc-config.yaml
encryption-provider-config: /var/lib/k0s/encryption.cfg
profiling: "false"
request-timeout: 1m0s
service-node-port-range: 32768-35535
sans:
- 80xa1p-mke4-lb-6972b00c98d1bfe2.elb.us-east-1.amazonaws.com
controllerManager:
extraArgs:
profiling: "false"
terminated-pod-gc-threshold: "12500"
extensions:
helm:
charts:
- chartname: oci://ghcr.io/mirantiscontainers/mke4-ucpauthz
name: ucpauthz
namespace: mke
order: 3
timeout: 10m0s
values: "disabled: false\nexempt:\n namespaces:\n \n users:\n \n
\ - system:serviceaccount:calico-apiserver:calico-apiserver\n \n
\ - system:serviceaccount:calico-system:calico-cni-plugin\n \n
\ - system:serviceaccount:calico-system:calico-kube-controllers\n
\ \n - system:serviceaccount:calico-system:calico-node\n \n -
system:serviceaccount:calico-system:calico-typha\n \n - system:serviceaccount:calico-system:csi-node-driver\n
\ \n - system:serviceaccount:calico-system:default\n \n - system:serviceaccount:tigera-operator:default\n
\ \n - system:serviceaccount:calico-apiserver:default\n \n -
system:serviceaccount:tigera-operator:tigera-operator\n "
version: 0.1.0
- chartname: oci://ghcr.io/mirantiscontainers/mke4-tigera-operator-crds
name: mke4-tigera-operator-crds
namespace: tigera-operator
order: 4
timeout: 10m0s
version: v3.30.200
- chartname: oci://ghcr.io/mirantiscontainers/mke4-tigera-operator
name: tigera-operator
namespace: tigera-operator
order: 4
timeout: 10m0s
values: |-
kubeletVolumePluginPath: /var/lib/kubelet
installation:
registry: ghcr.io/mirantiscontainers/
logging:
cni:
logSeverity: Info
cni:
type: Calico
kubeletVolumePluginPath: /var/lib/kubelet
calicoNetwork:
linuxDataplane: Iptables
ipPools:
- cidr: 192.168.0.0/16
encapsulation: VXLAN
blockSize: 26
resources:
requests:
cpu: 250m
tigeraOperator:
version: v1.38.3
registry: ghcr.io/mirantiscontainers/
defaultFelixConfiguration:
enabled: true
logSeveritySys: Info
ipsecLogLevel: Info
bpfLogLevel: Info
vxlanPort: 4789
vxlanVNI: 10000
version: v3.30.200
- chartname: oci://registry.mirantis.com/k0rdent-enterprise/charts/k0rdent-enterprise
name: kcm
namespace: k0rdent
order: 6
timeout: 10m0s
values: |
{"velero":{"enabled":false,"image":{"repository":"registry.mirantis.com/k0rdent-enterprise/velero/velero"}},"cert-manager":{"clusterResourceNamespace":"mke","image":{"repository":"registry.mirantis.com/k0rdent-enterprise/jetstack/cert-manager-controller"},"webhook":{"image":{"repository":"registry.mirantis.com/k0rdent-enterprise/jetstack/cert-manager-webhook"},"tolerations":[{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}]},"cainjector":{"image":{"repository":"registry.mirantis.com/k0rdent-enterprise/jetstack/cert-manager-cainjector"},"tolerations":[{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}]},"startupapicheck":{"image":{"repository":"registry.mirantis.com/k0rdent-enterprise/jetstack/cert-manager-startupapicheck"},"tolerations":[{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}]},"tolerations":[{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}]},"controller":{"templatesRepoURL":"oci://registry.mirantis.com/k0rdent-enterprise/charts","globalRegistry":"registry.mirantis.com/k0rdent-enterprise","tolerations":[{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}]},"image":{"repository":"registry.mirantis.com/k0rdent-enterprise/kcm-controller"},"flux2":{"helmController":{"image":"registry.mirantis.com/k0rdent-enterprise/fluxcd/helm-controller","tolerations":[{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}]},"sourceController":{"image":"registry.mirantis.com/k0rdent-enterprise/fluxcd/source-controller","tolerations":[{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}]},"cli":{"image":"registry.mirantis.com/k0rdent-enterprise/fluxcd/flux-cli","tolerations":[{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}]}},"cluster-api-operator":{"image":{"manager":{"repository":"registry.mirantis.com/k0rdent-enterprise/capi-operator/cluster-api-operator"}}},"k0rdent-ui":{"enabled":true,"image":{"repository":"registry.mirantis.com/k0rdent-enterprise/k0rdent-ui"},"tolerations":[{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}]}}
version: 1.1.0
images:
calico:
cni:
image: quay.io/k0sproject/calico-cni
version: v3.29.4-0
kubecontrollers:
image: quay.io/k0sproject/calico-kube-controllers
version: v3.29.4-0
node:
image: quay.io/k0sproject/calico-node
version: v3.29.4-0
coredns:
image: quay.io/k0sproject/coredns
version: 1.12.2
default_pull_policy: IfNotPresent
konnectivity:
image: quay.io/k0sproject/apiserver-network-proxy-agent
version: v0.31.0
kubeproxy:
image: quay.io/k0sproject/kube-proxy
version: v1.32.6
kuberouter:
cni:
image: quay.io/k0sproject/kube-router
version: v2.4.1-iptables1.8.9-0
cniInstaller:
image: quay.io/k0sproject/cni-node
version: 1.3.0-k0s.0
metricsserver:
image: registry.k8s.io/metrics-server/metrics-server
version: v0.7.2
pause:
image: registry.k8s.io/pause
version: "3.9"
pushgateway:
image: quay.io/k0sproject/pushgateway-ttl
version: 1.4.0-k0s.0
repository: ghcr.io/mirantiscontainers
network:
clusterDomain: cluster.local
controlPlaneLoadBalancing:
enabled: false
dualStack:
enabled: false
kubeProxy:
iptables:
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
minSyncPeriod: 0s
syncPeriod: 0s
tcpFinTimeout: 0s
tcpTimeout: 0s
udpTimeout: 0s
metricsBindAddress: 0.0.0.0:10249
mode: iptables
nftables:
minSyncPeriod: 0s
syncPeriod: 0s
kuberouter:
autoMTU: true
hairpin: Enabled
metricsPort: 8080
nodeLocalLoadBalancing:
enabled: false
envoyProxy:
apiServerBindPort: 7443
image:
image: quay.io/k0sproject/envoy-distroless
version: v1.31.5
konnectivityServerBindPort: 7132
type: EnvoyProxy
podCIDR: 192.168.0.0/16
provider: custom
serviceCIDR: 10.96.0.0/16
scheduler:
extraArgs:
bind-address: 127.0.0.1
profiling: "false"
storage:
etcd: {}
type: etcd
telemetry:
enabled: true
workerProfiles:
- name: mke-default-worker
values:
eventRecordQPS: 50
kubeReserved:
cpu: 50m
ephemeral-storage: 500Mi
memory: 300Mi
maxPods: 110
podPidsLimit: -1
podsPerCore: 0
protectKernelDefaults: false
seccompDefault: false
- name: mke-default-manager
values:
eventRecordQPS: 50
kubeReserved:
cpu: 250m
ephemeral-storage: 4Gi
memory: 2Gi
maxPods: 110
podPidsLimit: -1
podsPerCore: 0
protectKernelDefaults: false
seccompDefault: false
options:
wait:
enabled: false
drain:
enabled: false
gracePeriod: 0s
timeout: 0s
force: false
ignoreDaemonSets: false
deleteEmptyDirData: false
podSelector: ""
skipWaitForDeleteTimeout: 0s
concurrency:
limit: 0
workerDisruptionPercent: 0
uploads: 0
evictTaint:
enabled: false
taint: ""
effect: ""
controllerWorkers: false
- Reset the cluster with
k0sctl reset - Inspect
/var/lib/kubeleton every node
The controller+worker node has the dir present with all the files:
# ls -al /var/lib/kubelet
total 44
drwxr-xr-x 9 root root 4096 Nov 21 02:58 .
drwxr-xr-x 43 root root 4096 Dec 2 18:53 ..
drwx------ 2 root root 4096 Nov 21 02:58 checkpoints
-rw------- 1 root root 62 Nov 21 02:58 cpu_manager_state
drwxr-xr-x 2 root root 4096 Dec 2 18:35 device-plugins
-rw------- 1 root root 61 Nov 21 02:58 memory_manager_state
drwxr-xr-x 2 root root 4096 Dec 2 18:35 pki
drwxr-x--- 3 root root 4096 Nov 21 02:58 plugins
drwxr-x--- 2 root root 4096 Dec 2 18:36 plugins_registry
drwxr-x--- 2 root root 4096 Dec 2 18:35 pod-resources
drwxr-x--- 39 root root 4096 Dec 2 18:43 pods
The worker node doesn't have the dir
# ls -al /var/lib/kubelet
ls: cannot access '/var/lib/kubelet': No such file or directory
In the reset logs, I can see that the k0s reset command for the controller doesn't include the kubelet flag while the same command for the worker node does incluide the flag.
time="2025-12-02T13:52:55-05:00" level=info msg="==> Running phase: Reset workers"
...
time="2025-12-02T13:52:55-05:00" level=debug msg="[ssh] 3.227.235.72:22: resetting k0s..."
time="2025-12-02T13:52:55-05:00" level=debug msg="[ssh] 3.227.235.72:22: executing `sudo -- /usr/local/bin/k0s reset --data-dir=/var/lib/k0s --kubelet-root-dir=/var/lib/kubelet`"
...
time="2025-12-02T13:53:08-05:00" level=info msg="==> Running phase: Reset controllers"
...
time="2025-12-02T13:53:15-05:00" level=debug msg="[ssh] 3.237.0.80:22: resetting k0s..."
time="2025-12-02T13:53:15-05:00" level=debug msg="[ssh] 3.237.0.80:22: executing `sudo -- /usr/local/bin/k0s reset --data-dir=/var/lib/k0s`"
...