-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calico-node pods being recreated every few seconds #4538
Comments
I brought one node for a reboot today for maintenance today and the It appears that the calido-node pods only run properly if that of the aforementioned node is in Pending state I then tried to remove the node from the cluster entirely, which removed its |
I found a deployment whose pod had been stuck |
I had a similar problem (calico pods restarting again and again) and solved the issue with the same fix : deleting all pods in the "pending" state. |
Hello,
Thank you very much for Microk8s. It is an awesome project that I have been enjoying for several years now. However I am recently facing the following issue
Summary
I am experiencing a strange behavior with Calico v 3.25.1 running on a 4 node Microk8s v1.30 cluster. The calico-node pods of each node keep being terminated and recreated after at most 30 seconds
The nodes report network unavailability:
Looking at the logs of the
calico-node
pods, for the short time they are running, does not reveal any immediately clear problemWhat Should Happen Instead?
calico-node
pods should be running continuouslyReproduction Steps
I had no such issues when using Microk8s v1.24. From there, my changes to the cluster include:
10.1.0.0/16,10.152.183.0/24,*.svc,*.cluster.local
to the NO_PROXY environment variable for Longhorn to workEnvironment
The cluster runs on-premise, behind a corporate proxy. Environment variables are set accordingly in
/etc/environment
of each node, as per recommended hereSimilarily, containerd environment variables are set in
/var/snap/microk8s/current/args/containerd-env
:Probably unrelated, but Longhorn v1.6.1 is installed in the cluster using Helm.
I've got a similar cluster, composed of AWS EC2s that dot not require proxy settings and have no problems there.
Introspection Report
inspection-report-20240530_151748.tar.gz
Note: Inspection outputs the following error, as per #4361:
cp: cannot stat '/var/snap/microk8s/6876/var/kubernetes/backend/localnode.yaml': No such file or directory
Can you suggest a fix?
What I tried so far, unsuccessfully:
Are you interested in contributing with a fix?
I'm afraid I lack the technical knowledge to do so
The text was updated successfully, but these errors were encountered: