You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During normal operation on a 2 month fresh 3 machines cluster installed on ubuntu 24.04, portainer suddenly went offline. Upon inspecting, one of the nodes was not responding, possibly due to unrelated issues (thermal issue). Upon reboot, the cluster was out of sync and all services were offline. Trying to narrow down, servers were stopped, then started again one by one. Eventually, I got to
microk8s kubectl get nodes
NAME STATUS ROLES AGE VERSION
potato01.kai.senbonzakura.net Ready <none> 57d v1.30.1
potato02.kai.senbonzakura.net Ready <none> 57d v1.30.1
potato03.kai.senbonzakura.net Ready <none> 57d v1.30.1
microk8s kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fallback-service ClusterIP 10.152.183.77 <none> 80/TCP 14d
keycloak ClusterIP 10.152.183.191 <none> 80/TCP 37d
keycloak-headless ClusterIP None <none> 8080/TCP 37d
keycloak-postgresql ClusterIP 10.152.183.25 <none> 5432/TCP 37d
keycloak-postgresql-hl ClusterIP None <none> 5432/TCP 37d
kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 57d
microk8s kubectl get pods
NAME READY STATUS RESTARTS AGE
fallback-deployment-7c8f6894c4-pk44f 0/1 Terminating 66 (3m17s ago) 14d
keycloak-postgresql-0 1/1 Terminating 1 (177m ago) 37d
But even so, portainer wouldnt be working.
root@potato02:/home/killua# microk8s inspect
Inspecting system
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
^[[H Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-kubelite is running
Service snap.microk8s.daemon-k8s-dqlite is running
Service snap.microk8s.daemon-apiserver-kicker is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy openSSL information to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy current linux distribution to the final report tarball
Copy asnycio usage and limits to the final report tarball
Copy inotify max_user_instances and max_user_watches to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting dqlite
Inspect dqlite
^[[Hcp: cannot stat '/var/snap/microk8s/6876/var/kubernetes/backend/localnode.yaml': No such file or directory
Building the report tarball
Report tarball is at /var/snap/microk8s/6876/inspection-report-20240701_211849.tar.gz
The localnode.yaml isn't on any of the nodes, so I'm not sure if this is a false positive or not. #4361 pointed me to
Summary
During normal operation on a 2 month fresh 3 machines cluster installed on ubuntu 24.04, portainer suddenly went offline. Upon inspecting, one of the nodes was not responding, possibly due to unrelated issues (thermal issue). Upon reboot, the cluster was out of sync and all services were offline. Trying to narrow down, servers were stopped, then started again one by one. Eventually, I got to
But even so, portainer wouldnt be working.
The localnode.yaml isn't on any of the nodes, so I'm not sure if this is a false positive or not. #4361 pointed me to
any ideas on how to the cluster into a working state?
What Should Happen Instead?
Reproduction Steps
unfortunately no idea how
Introspection Report
inspection-report-20240701_211849.tar.gz
Can you suggest a fix?
really no idea
Are you interested in contributing with a fix?
yes
The text was updated successfully, but these errors were encountered: