Mimir components stopped working after upgrade of Helm Charts to version 5.2.0 #7240
Unanswered
abanfi-nozomi
asked this question in
Help and support
Replies: 1 comment 4 replies
-
did you try rolling back to 5.2.0? |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
for some time now we have been installing in our K8s on EKS clusters a full LGTM stack, with all components deployed as Helm Chart.
After an upgrade of Tempo to version 1.8.2 from 1.8.1 and Mimir to version 5.2.1 from 5.2.0, the read and write capability of the Mimir deployment was completely corrupted.
The
distributors
were attempting to send metrics to pods that were unavailable, throwing errors such asuser=anonymous msg="push error\" err="at least 2 live replicas are required, only 1 could be found--unhealthy instances: xxx:xxx: xxx:xxx:9095,xxx:xxx:xxx:9095\""
, while all Grafana panels showed an error such as "expanding series: too many unhealthy instances in the ring (internal: rpc error: code = Code(500)`, coming from mimir-query-frontend.The Mimir distributors were up and running, but the distributors and querier seemed to be looking for endpoints that did not exist.
We tried reducing all Mimir deployments and statefulsets to zero replication, to no avail.
We tried updating the membertlist at the distributor level, but still nothing.
The solution, to get back to a fully functioning distribution, was to uninstall and reinstall Mimir chart helm from scratch.
We wondered if anyone else had had the same problem and how it could be solved without having to do a new installation.
Beta Was this translation helpful? Give feedback.
All reactions