Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keycloak fails to start due to infinispan state transfer exception #21092

Closed
2 tasks done
kopvortex opened this issue Jun 19, 2023 · 28 comments · Fixed by #22386
Closed
2 tasks done

Keycloak fails to start due to infinispan state transfer exception #21092

kopvortex opened this issue Jun 19, 2023 · 28 comments · Fixed by #22386
Assignees
Labels
area/infinispan kind/bug Categorizes a PR related to a bug team/cloud-native
Milestone

Comments

@kopvortex
Copy link

kopvortex commented Jun 19, 2023

Before reporting an issue

  • I have searched existing issues
  • I have reproduced the issue with the latest release

Area

infinispan

Describe the bug

Running into following exception when starting new keycloak container on k8s.

2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.counter.impl.CounterModuleLifecycle@7e76b944.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.tasks.impl.LifecycleCallbacks@60d67fbd.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.jboss.marshalling.JbossMarshallingModule@2a2df0de.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.server.hotrod.LifecycleCallbacks@5fc79f76.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.multimap.impl.MultimapModuleLifecycle@60ae908e.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.persistence.remote.LifecycleCallbacks@adf8238.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.server.core.LifecycleCallbacks@7c48f27b.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl] (keycloak-cache-init) Invoking listener: org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler@3bbe1c15 passing event EventImpl{type=CACHE_STOPPED, newMembers=null, oldMembers=null, localAddress=null, viewId=0, subgroupsMerged=null, mergeVie
w=false}
2023-06-19 21:42:16,447 TRACE [org.infinispan.factories.impl.BasicComponentRegistryImpl] (keycloak-cache-init) Changed status of org.infinispan.globalstate.GlobalConfigurationManager to FAILED
2023-06-19 21:42:16,447 ERROR [org.infinispan.CONFIG] (keycloak-cache-init) ISPN000660: DefaultCacheManager start failed, stopping any running components: org.infinispan.commons.CacheException: java.lang.InterruptedException
        at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:243)
        at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1013)
        at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:504)
        at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:723)
        at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:669)
        at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:558)
        at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:521)
        at org.infinispan.security.actions.GetCacheAction.run(GetCacheAction.java:26)
        at org.infinispan.security.actions.GetCacheAction.run(GetCacheAction.java:14)
        at org.infinispan.security.Security.doPrivileged(Security.java:56)
        at org.infinispan.globalstate.impl.SecurityActions.doPrivileged(SecurityActions.java:30)
        at org.infinispan.globalstate.impl.SecurityActions.getCache(SecurityActions.java:39)
        at org.infinispan.globalstate.impl.GlobalConfigurationManagerImpl.start(GlobalConfigurationManagerImpl.java:111)
        at org.infinispan.globalstate.impl.CorePackageImpl$2.start(CorePackageImpl.java:60)
        at org.infinispan.globalstate.impl.CorePackageImpl$2.start(CorePackageImpl.java:48)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:617)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:608)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:577)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:808)
        at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:357)
        at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:250)
        at org.infinispan.manager.DefaultCacheManager.internalStart(DefaultCacheManager.java:775)
        at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:743)
        at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:407)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.startCacheManager(CacheManagerFactory.java:96)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.InterruptedException
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:385)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
        at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:236)
        ... 28 more

Subsequent startup failure.

2023-06-19 21:42:16,491 DEBUG [org.infinispan.quarkus.hibernate.cache.QuarkusInfinispanRegionFactory] (main) Stop region factory
2023-06-19 21:42:16,492 DEBUG [org.infinispan.quarkus.hibernate.cache.QuarkusInfinispanRegionFactory] (main) Clear region references
2023-06-19 21:42:16,533 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Failed to start server in (production) mode
2023-06-19 21:42:16,533 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) Error details:: java.lang.RuntimeException: Failed to start caches
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.getOrCreate(CacheManagerFactory.java:61)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory_3e2e78b5a5eee8303325d41faca0a80d7da888f7_Synthetic_ClientProxy.getOrCreate(Unknown Source)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.QuarkusCacheManagerProvider.getCacheManager(QuarkusCacheManagerProvider.java:32)
        at org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory.lazyInit(DefaultInfinispanConnectionProviderFactory.java:143)
        at org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory.create(DefaultInfinispanConnectionProviderFactory.java:83)
        at org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory.create(DefaultInfinispanConnectionProviderFactory.java:67)
        at org.keycloak.services.DefaultKeycloakSession.getProvider(DefaultKeycloakSession.java:271)
        at org.keycloak.models.sessions.infinispan.InfinispanSingleUseObjectProviderFactory.getSingleUseObjectCache(InfinispanSingleUseObjectProviderFactory.java:53)
        at org.keycloak.models.sessions.infinispan.InfinispanSingleUseObjectProviderFactory.postInit(InfinispanSingleUseObjectProviderFactory.java:77)
        at org.keycloak.quarkus.runtime.integration.QuarkusKeycloakSessionFactory.init(QuarkusKeycloakSessionFactory.java:105)
        at org.keycloak.quarkus.runtime.integration.jaxrs.QuarkusKeycloakApplication.createSessionFactory(QuarkusKeycloakApplication.java:41)
        at org.keycloak.services.resources.KeycloakApplication.startup(KeycloakApplication.java:125)
        at org.keycloak.quarkus.runtime.integration.QuarkusLifecycleObserver.onStartupEvent(QuarkusLifecycleObserver.java:37)
        at org.keycloak.quarkus.runtime.integration.QuarkusLifecycleObserver_Observer_onStartupEvent_b0e82415b143738dc1f986a5fa4668e83d0a5dea.notify(Unknown Source)
        at io.quarkus.arc.impl.EventImpl$Notifier.notifyObservers(EventImpl.java:326)
        at io.quarkus.arc.impl.EventImpl$Notifier.notify(EventImpl.java:308)
        at io.quarkus.arc.impl.EventImpl.fire(EventImpl.java:76)
        at io.quarkus.arc.runtime.ArcRecorder.fireLifecycleEvent(ArcRecorder.java:131)
        at io.quarkus.arc.runtime.ArcRecorder.handleLifecycleEvents(ArcRecorder.java:100)
        at io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294.deploy_0(Unknown Source)
        at io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294.deploy(Unknown Source)
        at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
        at io.quarkus.runtime.Application.start(Application.java:101)
        at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:110)
        at io.quarkus.runtime.Quarkus.run(Quarkus.java:70)
        at org.keycloak.quarkus.runtime.KeycloakMain.start(KeycloakMain.java:98)
        at org.keycloak.quarkus.runtime.cli.command.AbstractStartCommand.run(AbstractStartCommand.java:37)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at org.keycloak.quarkus.runtime.cli.Picocli.parseAndRun(Picocli.java:94)
        at org.keycloak.quarkus.runtime.KeycloakMain.main(KeycloakMain.java:88)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:61)
        at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:32)
Caused by: java.util.concurrent.TimeoutException
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:204)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.getOrCreate(CacheManagerFactory.java:59)
        ... 42 more

Configuration:

We use the default cache-ispn.yaml and the configuration works fine on 20.0.5

Version

21.1.1

Expected behavior

Keycloak should start without any infinispan error.

Actual behavior

Keycloak container fails to start due to infinispan error.

How to Reproduce?

Able to reproduce on version 21.1.1 with following config

KC_CACHE_STACK - kubernetes
KC_HEALTH_ENABLED - true
KC_METRICS_ENABLED - true
KC_PROXY - reencrypt

Anything else?

No response

@kopvortex kopvortex added kind/bug Categorizes a PR related to a bug status/triage labels Jun 19, 2023
@sschu
Copy link
Contributor

sschu commented Jun 21, 2023

Can you provide exact steps to reproduce?

@abstractj
Copy link
Contributor

@kopvortex as mentioned by @sschu providing the steps to reproduce is essential for us to proceed. Otherwise, we may close this issue.

@kopvortex
Copy link
Author

We have set up Keycloak on EKS (Elastic Kubernetes Service) with the integration of JGroup and Infinispan. To replicate the issue, I initiated a Keycloak deployment and then initiated another deployment before the first one was fully operational. Keycloak container in the second deployment fails with error.

I noticed following in logs with jgroup trace logs enabled.

2023-06-20 01:40:33,821 WARN  [org.jgroups.protocols.TCP] (TQ-Bundler-7,keycloak-7b987d9f68-rtbgz-2491) JGRP000032: keycloak-7b987d9f68-rtbgz-2491: no physical address for e0a9c418-fc6d-4635-89d3-4b74f3a1601f, dropping message                                      
java.lang.NullPointerException
        at org.jgroups.protocols.FD_SOCK2.getPhysicalAddresses(FD_SOCK2.java:445)
        at org.jgroups.protocols.FD_SOCK2.connectTo(FD_SOCK2.java:395)
        at org.jgroups.protocols.FD_SOCK2.connectToNextPingDest(FD_SOCK2.java:376)
        at org.jgroups.protocols.FD_SOCK2.handle(FD_SOCK2.java:347)
        at org.jgroups.protocols.FD_SOCK2.handle(FD_SOCK2.java:31)
        at org.jgroups.util.ProcessingQueue.process(ProcessingQueue.java:55)
        at org.jgroups.util.ProcessingQueue.add(ProcessingQueue.java:35)
        at org.jgroups.protocols.FD_SOCK2.handleView(FD_SOCK2.java:364)
        at org.jgroups.protocols.FD_SOCK2.down(FD_SOCK2.java:227)
        at org.jgroups.protocols.FailureDetection.down(FailureDetection.java:149)
        at org.jgroups.protocols.VERIFY_SUSPECT2.down(VERIFY_SUSPECT2.java:84)
        at org.jgroups.protocols.pbcast.NAKACK2.down(NAKACK2.java:619)
        at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:611)
        at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:260)
        at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:676)
        at org.jgroups.protocols.pbcast.ServerGmsImpl.handleViewChange(ServerGmsImpl.java:66)
        at org.jgroups.protocols.pbcast.ParticipantGmsImpl.handleViewChange(ParticipantGmsImpl.java:112)
        at org.jgroups.protocols.pbcast.GMS.handle(GMS.java:991)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:855)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
        at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:470)
        at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:1010)
        at org.jgroups.protocols.pbcast.NAKACK2.removeAndDeliver(NAKACK2.java:943)
        at org.jgroups.protocols.pbcast.NAKACK2.handleMessageBatch(NAKACK2.java:915)
        at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:747)
        at org.jgroups.protocols.VERIFY_SUSPECT2.up(VERIFY_SUSPECT2.java:119)
        at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:193)
        at org.jgroups.protocols.FD_SOCK2.up(FD_SOCK2.java:202)
        at org.jgroups.protocols.MERGE3.up(MERGE3.java:288)
        at org.jgroups.protocols.Discovery.up(Discovery.java:314)
        at org.jgroups.protocols.RED.up(RED.java:119)
        at org.jgroups.protocols.TP.passBatchUp(TP.java:1204)
        at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.passBatchUp(MaxOneThreadPerSender.java:289)
        at org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:150)
        at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.run(MaxOneThreadPerSender.java:278)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

Seems related to this https://issues.redhat.com/browse/JGRP-2707

@sschu
Copy link
Contributor

sschu commented Jun 22, 2023

Then the solution would be to wait until Keycloak picks up the Infinispan version where this is fixed. Also see #21119 (comment)

@abstractj
Copy link
Contributor

abstractj commented Jul 6, 2023

@pruivo from my limited knowledge the changes proposed here #21064 can solve this issue. Is that correct?

@abstractj
Copy link
Contributor

@stianst added 22 milestone here because this is important for us to include in the upcoming release. When you are back, please also review #21064. I believe that must to be included in 22.

@pruivo
Copy link
Contributor

pruivo commented Jul 6, 2023

@pruivo from my limited knowledge the changes proposed here #21064 can solve this issue. Is that correct?

It can workaround the "problem" (although I don't believe the NPE causes any issue and the cluster is able to recover).

The user can replace the protocol FD_SOCK2 with FD_SOCK (or remove it). With the change in #21064 the following custom stack can be used:

<jgroups>
   <stack name="my-stack" extends="kubernetes">
      <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
   </stack>
</jgroups>

And set KC_CACHE_STACK="my-stack" (assuming it is running in OpenShift)

@ahus1
Copy link
Contributor

ahus1 commented Jul 21, 2023

I've been testing @pruivo's workaround together with the Keycloak Operator and found that it needed an additional configuration to make it work, as the DNS discovery hostname wasn't recognized as before.

See keycloak/keycloak-benchmark#440 for the PR in the Keycloak Benchmark project.

Changes to the Infinispan cache configuration configuration file:

    <!-- Workaround for https://github.com/keycloak/keycloak/issues/21092 -->
    <jgroups>
        <stack name="kubernetes-with-fdsock" extends="kubernetes">
            <!-- When using an embedded stack, replacement is done by Infinispan which requires environment variables to be
            prefixed with ".env", while JGroups falls back to environment variables without the prefix.
            The Keycloak Operator passes this information in an enviornment variable `jgroups.dns.query`.
            See https://github.com/keycloak/keycloak/issues/21830 for a discussion.
            -->
            <dns.DNS_PING dns_query="${env.jgroups.dns.query}" />
            <!-- Workaround for problems with FD_SOCK2, which might fail when the other nodes are not ready yet.
            See https://github.com/keycloak/keycloak/issues/21092 -->
            <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
        </stack>
    </jgroups>

And then adding the new cache name in the additional options in the Keycloak CR:

  additionalOptions:
     # Workaround for https://github.com/keycloak/keycloak/issues/21092
    - name: 'cache-stack'
      value: kubernetes-with-fdsock

@ahus1
Copy link
Contributor

ahus1 commented Jul 21, 2023

One more question towards @kopvortex's comment above:

To replicate the issue, I initiated a Keycloak deployment and then initiated another deployment before the first one was fully operational. Keycloak container in the second deployment fails with error.

When deploying Keycloak with a StatefulSet, the first Pod should be ready before starting the second Pod. Among other reasons as outlined in #11763 (comment), this is one of the reasons why the Keycloak Operator is using a StatefulSet.

Am I correct to assume you're using a Deployment for Keycloak in the scenario you described above?

@souravs17031999
Copy link
Contributor

@ahus1, For me I am using StatefulSets and the scenario is like we have deployed Keycloak in k8s and two pods are running on v21, now we start to upgrade to v22, so first pod which comes up with v22 (keeping one of the older one at v21) starts to throw NPE exceptions and goes to crashLoop continuosly failing the deployment.

@sschu
Copy link
Contributor

sschu commented Aug 1, 2023

@souravs17031999 This is is not supported as the Infinispan versions in different Keycloak versions are not compatible. Furthermore, the new Keycloak version might contain database migrations. To do a Keycloak version upgrade, you have to scale to zero pods first and then update.

@souravs17031999
Copy link
Contributor

Ok, thanks @sschu , makes sense.

@mabartos
Copy link
Contributor

mabartos commented Aug 1, 2023

@ahus1 Thanks for improving and verifying the workaround.

To summarize what's the future of this issue:

We've been discussing it with the maintainers of JGroups and Infinispan, and we will be waiting for JGroups 5.2.17 and Infinispan 14.0.x releases. These releases will contain a fix for this issue. It might be included in the next planned micro-release 22.0.2 of Keycloak, or potentially in the next one.

@Jojoooo1
Copy link

Jojoooo1 commented Aug 5, 2023

@sschu would you mind pointing to any documentation that talk about the upgrade process and the need of scaling down to 0 (especially when using infinispan) ? I tried to find informations on this but was not able to find any. Thanks a lot for the help :)

@sschu
Copy link
Contributor

sschu commented Aug 7, 2023

This is not explicitely mentioned. You can infer this from the upgrading guide (https://www.keycloak.org/docs/latest/upgrading/index.html) because this talks about the traditional upgrade process of installed software which implicitly shuts down the software before upgrading it.

@ahus1
Copy link
Contributor

ahus1 commented Aug 7, 2023

@Jojoooo1 / @sschu - thank you for pointing this out - the docs could be more descriptive on that. If someone would create a PR with some changes, I'd be happy to review and merge it. Thanks!

@ahus1
Copy link
Contributor

ahus1 commented Aug 11, 2023

@mabartos - I've started a draft PR targeting the latest 14.0.14-SNAPSHOT release, see #22386

ahus1 added a commit to ahus1/keycloak that referenced this issue Aug 15, 2023
mhajas pushed a commit that referenced this issue Aug 16, 2023
ahus1 added a commit to ahus1/keycloak that referenced this issue Aug 16, 2023
Closes keycloak#21092

(cherry picked from commit dfc8c80)
mhajas pushed a commit that referenced this issue Aug 16, 2023
Closes #21092

(cherry picked from commit dfc8c80)
@noiter
Copy link

noiter commented Aug 20, 2023

I also have been following up a similar issue happened to me. One thing weird to me is that,

no physical address for e0a9c418-fc6d-4635-89d3-4b74f3a1601f, dropping message

The node name looks weird as other node names are like keycloak-* (keycloak-7b987d9f68-rtbgz-2491 in above example). Does anyone know why it?

@pruivo
Copy link
Contributor

pruivo commented Aug 21, 2023

JGroups identifies nodes using UUID and it has a cache that maps UUID to logical names (keycloak-*) and physical addresses (IP and port). This cache is populated on demand and in your case, it seems the UUID is not present in the cache. The message should go away after a while after the message is retried and the cache is populated with that destination.

If it does not go away, it may be some misconfiguration or some network issue.

@noiter
Copy link

noiter commented Aug 21, 2023

Another issue I am facing when I am trying to override the kubernetes cache stack (by following suggestions from #21092 (comment) and #21092 (comment)) to work around this issue: two Keycloak instances started successfully, but same log no members discovered after 2003 ms: creating cluster as coordinator got printed for both instances. Looks like the distributed cache nodes failed to discover each other.

My custom cache stack config file is like,

<infinispan ...>
    <jgroups>
        <stack name="kubernetes-with-fdsock" extends="kubernetes">
            <dns.DNS_PING dns_query="${env.jgroups.dns.query}"/>
            <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
        </stack>
    </jgroups>

    <cache-container name="keycloak" statistics="true">
        <transport lock-timeout="60000" stack="kubernetes-with-fdsock"/>
        ... ...
    </cache-container>
</infinispan>

I have also tried
1.

 <jgroups>
    <stack name="kubernetes-with-fdsock" extends="kubernetes">
        <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
    </stack>
</jgroups>
 <jgroups>
    <stack name="kubernetes-with-fdsock" extends="kubernetes">
        <dns.DNS_PING dns_query="${env.jgroups.dns.query}"/>
        <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
    </stack>
</jgroups>
 <jgroups>
    <stack name="kubernetes-with-fdsock" extends="kubernetes">
        <dns.DNS_PING dns_query="keycloak-headless"/>
        <FD_SOCK2 stack.combine="REMOVE"/>
    </stack>
</jgroups>
 <jgroups>
    <stack name="kubernetes-with-fdsock" extends="kubernetes">
        <dns.DNS_PING dns_query="keycloak-headless"/>
        <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
    </stack>
</jgroups>

My Keycloak config is like:

- name: KC_CACHE
  value: ispn
- name: KC_CACHE_CONFIG_FILE
  value: cache-ispn-fdsock.xml
- name: JAVA_OPTS_APPEND
  value: -Djgroups.dns.query=keycloak-headless ...

Keyclock version: 21.1.1
Platform: AWS MKS (k8s)
Way of deployment: StatefulSet

Can someone shed some lights on it? Thanks in advance.

@pruivo
Copy link
Contributor

pruivo commented Aug 21, 2023

The first one should work but you need to configure the cache-stack.

     # Workaround for https://github.com/keycloak/keycloak/issues/21092
    - name: 'cache-stack'
      value: kubernetes-with-fdsock

Example: keycloak/keycloak-benchmark@880e4ea

@noiter
Copy link

noiter commented Aug 21, 2023

@pruivo thanks for replying. I also tried it before, just tried it again, but unfortunately I got error below,

2023-08-21T12:26:06.527Z ERROR keycloak-cache-init org.infinispan.CONFIG : ISPN000660: DefaultCacheManager start failed, stopping any running components: org.infinispan.commons.CacheConfigurationException: ISPN000540: No such JGroups stack 'kubernetes-with-fdsock'
	at org.infinispan.configuration.global.JGroupsConfiguration.lambda$configurator$2(JGroupsConfiguration.java:69)
	at java.base/java.util.Optional.orElseThrow(Optional.java:403)
	at org.infinispan.configuration.global.JGroupsConfiguration.configurator(JGroupsConfiguration.java:69)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.buildChannel(JGroupsTransport.java:749)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.initChannel(JGroupsTransport.java:504)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:485)
	at org.infinispan.remoting.transport.jgroups.CorePackageImpl$1.start(CorePackageImpl.java:42)
	at org.infinispan.remoting.transport.jgroups.CorePackageImpl$1.start(CorePackageImpl.java:27)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:616)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:607)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:576)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:807)
	at org.infinispan.metrics.impl.MetricsCollector.start(MetricsCollector.java:78)
	at org.infinispan.metrics.impl.CorePackageImpl$1.start(CorePackageImpl.java:41)
	at org.infinispan.metrics.impl.CorePackageImpl$1.start(CorePackageImpl.java:34)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:616)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:607)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:576)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:807)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.startDependencies(BasicComponentRegistryImpl.java:634)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:598)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:576)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:807)
	at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:379)
	at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:252)
	at org.infinispan.manager.DefaultCacheManager.internalStart(DefaultCacheManager.java:778)
	at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:746)
	at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:410)
	at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.startCacheManager(CacheManagerFactory.java:96)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
: org.infinispan.commons.CacheConfigurationException: ISPN000540: No such JGroups stack 'kubernetes-with-fdsock'
	at org.infinispan.configuration.global.JGroupsConfiguration.lambda$configurator$2(JGroupsConfiguration.java:69)
	at java.base/java.util.Optional.orElseThrow(Optional.java:403)
	at org.infinispan.configuration.global.JGroupsConfiguration.configurator(JGroupsConfiguration.java:69)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.buildChannel(JGroupsTransport.java:749)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.initChannel(JGroupsTransport.java:504)

Keycloak config is as below:

- name: KC_CACHE
  value: ispn
- name: KC_CACHE_STACK
  value: kubernetes-with-fdsock
- name: KC_CACHE_CONFIG_FILE
  value: cache-ispn-fdsock.xml

Any other suggestions?

@pruivo
Copy link
Contributor

pruivo commented Aug 21, 2023

did you upload cache-ispn-fdsock.xml as a configmap?

@noiter
Copy link

noiter commented Aug 21, 2023

I am using https://github.com/codecentric/helm-charts/tree/master/charts/keycloakx to do the Keycloak deployment. I don't think it is uploading cache-ispn-fdsock.xml as a configmap, instead simply specifying the cache config file via KC_CACHE_CONFIG_FILE variable. In the meantime, I am copying the cache-ispn-fdsock.xml under keycloak/conf folder when creating a new custom Keycloak image.

I am putting these KC_CACHE variables under extraEnv of the container keycloak of StatefulSet definition.

@pruivo
Copy link
Contributor

pruivo commented Aug 21, 2023

I'm not familiar with that helm chart.
Make sure /opt/keycloak/conf/cache-ispn-fdsock.xml file exists and it contains the stack with the name kubernetes-with-fdsock.

@noiter
Copy link

noiter commented Aug 21, 2023

Yeah I also confirmed that but unfortunately it didn't work still.

I've also tried locally with docker-compose, but as long as I specify the KC_CACHE_STACK with the custom stack name, it throws error

ISPN000540: No such JGroups stack 'kubernetes-with-fdsock'

, but such error is never thrown when I leave KC_CACHE_STACK empty but follow the way described https://www.keycloak.org/server/caching#_custom_transport_stacks to configure. But problem with it that Infinispan nodes don't discovery each other. :(

@MrDWilson
Copy link

@souravs17031999 This is is not supported as the Infinispan versions in different Keycloak versions are not compatible. Furthermore, the new Keycloak version might contain database migrations. To do a Keycloak version upgrade, you have to scale to zero pods first and then update.

I'm experiencing the same issue, and this sorted it for me. However, do we know if this is for major, minor or patch versions? We're going from 22 to 24 when noticing it, but would like to know for future if doing even a patch version would need a scale down first (as this would cause downtime).

@ahus1
Copy link
Contributor

ahus1 commented Apr 18, 2024

do we know if this is for major, minor or patch versions?

For now, we only support rolling upgrades when you stay on the exact same versions (including the patch level). You usually do this to change startup configurations or memory settings.

We would eventually support rolling upgrades on patch releases. We're currently discussing this. Once we have the right tests in place, and are sure we can guarantee it, we'll add this to the release notes and also to the Keycloak upgrade guide - https://www.keycloak.org/docs/latest/upgrading/index.html#_upgrading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infinispan kind/bug Categorizes a PR related to a bug team/cloud-native
Projects
None yet
Development

Successfully merging a pull request may close this issue.