-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keycloak fails to start due to infinispan state transfer exception #21092
Comments
Can you provide exact steps to reproduce? |
@kopvortex as mentioned by @sschu providing the steps to reproduce is essential for us to proceed. Otherwise, we may close this issue. |
We have set up Keycloak on EKS (Elastic Kubernetes Service) with the integration of JGroup and Infinispan. To replicate the issue, I initiated a Keycloak deployment and then initiated another deployment before the first one was fully operational. Keycloak container in the second deployment fails with error. I noticed following in logs with jgroup trace logs enabled.
Seems related to this https://issues.redhat.com/browse/JGRP-2707 |
Then the solution would be to wait until Keycloak picks up the Infinispan version where this is fixed. Also see #21119 (comment) |
It can workaround the "problem" (although I don't believe the NPE causes any issue and the cluster is able to recover). The user can replace the protocol <jgroups>
<stack name="my-stack" extends="kubernetes">
<FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
</stack>
</jgroups> And set |
I've been testing @pruivo's workaround together with the Keycloak Operator and found that it needed an additional configuration to make it work, as the DNS discovery hostname wasn't recognized as before. See keycloak/keycloak-benchmark#440 for the PR in the Keycloak Benchmark project. Changes to the Infinispan cache configuration configuration file:
And then adding the new cache name in the additional options in the Keycloak CR:
|
One more question towards @kopvortex's comment above:
When deploying Keycloak with a StatefulSet, the first Pod should be ready before starting the second Pod. Among other reasons as outlined in #11763 (comment), this is one of the reasons why the Keycloak Operator is using a StatefulSet. Am I correct to assume you're using a Deployment for Keycloak in the scenario you described above? |
@ahus1, For me I am using StatefulSets and the scenario is like we have deployed Keycloak in k8s and two pods are running on v21, now we start to upgrade to v22, so first pod which comes up with v22 (keeping one of the older one at v21) starts to throw NPE exceptions and goes to crashLoop continuosly failing the deployment. |
@souravs17031999 This is is not supported as the Infinispan versions in different Keycloak versions are not compatible. Furthermore, the new Keycloak version might contain database migrations. To do a Keycloak version upgrade, you have to scale to zero pods first and then update. |
Ok, thanks @sschu , makes sense. |
@ahus1 Thanks for improving and verifying the workaround. To summarize what's the future of this issue: We've been discussing it with the maintainers of JGroups and Infinispan, and we will be waiting for JGroups 5.2.17 and Infinispan 14.0.x releases. These releases will contain a fix for this issue. It might be included in the next planned micro-release 22.0.2 of Keycloak, or potentially in the next one. |
@sschu would you mind pointing to any documentation that talk about the upgrade process and the need of scaling down to 0 (especially when using infinispan) ? I tried to find informations on this but was not able to find any. Thanks a lot for the help :) |
This is not explicitely mentioned. You can infer this from the upgrading guide (https://www.keycloak.org/docs/latest/upgrading/index.html) because this talks about the traditional upgrade process of installed software which implicitly shuts down the software before upgrading it. |
Closes keycloak#21092 (cherry picked from commit dfc8c80)
I also have been following up a similar issue happened to me. One thing weird to me is that,
The node name looks weird as other node names are like keycloak-* (keycloak-7b987d9f68-rtbgz-2491 in above example). Does anyone know why it? |
JGroups identifies nodes using UUID and it has a cache that maps UUID to logical names ( If it does not go away, it may be some misconfiguration or some network issue. |
Another issue I am facing when I am trying to override the kubernetes cache stack (by following suggestions from #21092 (comment) and #21092 (comment)) to work around this issue: two Keycloak instances started successfully, but same log My custom cache stack config file is like,
I have also tried
My Keycloak config is like:
Keyclock version: 21.1.1 Can someone shed some lights on it? Thanks in advance. |
The first one should work but you need to configure the
Example: keycloak/keycloak-benchmark@880e4ea |
@pruivo thanks for replying. I also tried it before, just tried it again, but unfortunately I got error below,
Keycloak config is as below:
Any other suggestions? |
did you upload |
I am using https://github.com/codecentric/helm-charts/tree/master/charts/keycloakx to do the Keycloak deployment. I don't think it is uploading I am putting these KC_CACHE variables under extraEnv of the container keycloak of StatefulSet definition. |
I'm not familiar with that helm chart. |
Yeah I also confirmed that but unfortunately it didn't work still. I've also tried locally with docker-compose, but as long as I specify the KC_CACHE_STACK with the custom stack name, it throws error
, but such error is never thrown when I leave KC_CACHE_STACK empty but follow the way described https://www.keycloak.org/server/caching#_custom_transport_stacks to configure. But problem with it that Infinispan nodes don't discovery each other. :( |
I'm experiencing the same issue, and this sorted it for me. However, do we know if this is for major, minor or patch versions? We're going from 22 to 24 when noticing it, but would like to know for future if doing even a patch version would need a scale down first (as this would cause downtime). |
For now, we only support rolling upgrades when you stay on the exact same versions (including the patch level). You usually do this to change startup configurations or memory settings. We would eventually support rolling upgrades on patch releases. We're currently discussing this. Once we have the right tests in place, and are sure we can guarantee it, we'll add this to the release notes and also to the Keycloak upgrade guide - https://www.keycloak.org/docs/latest/upgrading/index.html#_upgrading |
Before reporting an issue
Area
infinispan
Describe the bug
Running into following exception when starting new keycloak container on k8s.
Subsequent startup failure.
Configuration:
We use the default cache-ispn.yaml and the configuration works fine on 20.0.5
Version
21.1.1
Expected behavior
Keycloak should start without any infinispan error.
Actual behavior
Keycloak container fails to start due to infinispan error.
How to Reproduce?
Able to reproduce on version 21.1.1 with following config
KC_CACHE_STACK - kubernetes
KC_HEALTH_ENABLED - true
KC_METRICS_ENABLED - true
KC_PROXY - reencrypt
Anything else?
No response
The text was updated successfully, but these errors were encountered: