Keycloak fails to start due to infinispan state transfer exception #21092

kopvortex · 2023-06-19T22:12:06Z

Before reporting an issue

I have searched existing issues
I have reproduced the issue with the latest release

Area

infinispan

Describe the bug

Running into following exception when starting new keycloak container on k8s.

2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.counter.impl.CounterModuleLifecycle@7e76b944.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.tasks.impl.LifecycleCallbacks@60d67fbd.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.jboss.marshalling.JbossMarshallingModule@2a2df0de.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.server.hotrod.LifecycleCallbacks@5fc79f76.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.multimap.impl.MultimapModuleLifecycle@60ae908e.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.persistence.remote.LifecycleCallbacks@adf8238.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.server.core.LifecycleCallbacks@7c48f27b.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl] (keycloak-cache-init) Invoking listener: org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler@3bbe1c15 passing event EventImpl{type=CACHE_STOPPED, newMembers=null, oldMembers=null, localAddress=null, viewId=0, subgroupsMerged=null, mergeVie
w=false}
2023-06-19 21:42:16,447 TRACE [org.infinispan.factories.impl.BasicComponentRegistryImpl] (keycloak-cache-init) Changed status of org.infinispan.globalstate.GlobalConfigurationManager to FAILED
2023-06-19 21:42:16,447 ERROR [org.infinispan.CONFIG] (keycloak-cache-init) ISPN000660: DefaultCacheManager start failed, stopping any running components: org.infinispan.commons.CacheException: java.lang.InterruptedException
        at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:243)
        at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1013)
        at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:504)
        at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:723)
        at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:669)
        at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:558)
        at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:521)
        at org.infinispan.security.actions.GetCacheAction.run(GetCacheAction.java:26)
        at org.infinispan.security.actions.GetCacheAction.run(GetCacheAction.java:14)
        at org.infinispan.security.Security.doPrivileged(Security.java:56)
        at org.infinispan.globalstate.impl.SecurityActions.doPrivileged(SecurityActions.java:30)
        at org.infinispan.globalstate.impl.SecurityActions.getCache(SecurityActions.java:39)
        at org.infinispan.globalstate.impl.GlobalConfigurationManagerImpl.start(GlobalConfigurationManagerImpl.java:111)
        at org.infinispan.globalstate.impl.CorePackageImpl$2.start(CorePackageImpl.java:60)
        at org.infinispan.globalstate.impl.CorePackageImpl$2.start(CorePackageImpl.java:48)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:617)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:608)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:577)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:808)
        at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:357)
        at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:250)
        at org.infinispan.manager.DefaultCacheManager.internalStart(DefaultCacheManager.java:775)
        at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:743)
        at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:407)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.startCacheManager(CacheManagerFactory.java:96)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.InterruptedException
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:385)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
        at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:236)
        ... 28 more

Subsequent startup failure.

2023-06-19 21:42:16,491 DEBUG [org.infinispan.quarkus.hibernate.cache.QuarkusInfinispanRegionFactory] (main) Stop region factory
2023-06-19 21:42:16,492 DEBUG [org.infinispan.quarkus.hibernate.cache.QuarkusInfinispanRegionFactory] (main) Clear region references
2023-06-19 21:42:16,533 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Failed to start server in (production) mode
2023-06-19 21:42:16,533 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) Error details:: java.lang.RuntimeException: Failed to start caches
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.getOrCreate(CacheManagerFactory.java:61)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory_3e2e78b5a5eee8303325d41faca0a80d7da888f7_Synthetic_ClientProxy.getOrCreate(Unknown Source)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.QuarkusCacheManagerProvider.getCacheManager(QuarkusCacheManagerProvider.java:32)
        at org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory.lazyInit(DefaultInfinispanConnectionProviderFactory.java:143)
        at org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory.create(DefaultInfinispanConnectionProviderFactory.java:83)
        at org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory.create(DefaultInfinispanConnectionProviderFactory.java:67)
        at org.keycloak.services.DefaultKeycloakSession.getProvider(DefaultKeycloakSession.java:271)
        at org.keycloak.models.sessions.infinispan.InfinispanSingleUseObjectProviderFactory.getSingleUseObjectCache(InfinispanSingleUseObjectProviderFactory.java:53)
        at org.keycloak.models.sessions.infinispan.InfinispanSingleUseObjectProviderFactory.postInit(InfinispanSingleUseObjectProviderFactory.java:77)
        at org.keycloak.quarkus.runtime.integration.QuarkusKeycloakSessionFactory.init(QuarkusKeycloakSessionFactory.java:105)
        at org.keycloak.quarkus.runtime.integration.jaxrs.QuarkusKeycloakApplication.createSessionFactory(QuarkusKeycloakApplication.java:41)
        at org.keycloak.services.resources.KeycloakApplication.startup(KeycloakApplication.java:125)
        at org.keycloak.quarkus.runtime.integration.QuarkusLifecycleObserver.onStartupEvent(QuarkusLifecycleObserver.java:37)
        at org.keycloak.quarkus.runtime.integration.QuarkusLifecycleObserver_Observer_onStartupEvent_b0e82415b143738dc1f986a5fa4668e83d0a5dea.notify(Unknown Source)
        at io.quarkus.arc.impl.EventImpl$Notifier.notifyObservers(EventImpl.java:326)
        at io.quarkus.arc.impl.EventImpl$Notifier.notify(EventImpl.java:308)
        at io.quarkus.arc.impl.EventImpl.fire(EventImpl.java:76)
        at io.quarkus.arc.runtime.ArcRecorder.fireLifecycleEvent(ArcRecorder.java:131)
        at io.quarkus.arc.runtime.ArcRecorder.handleLifecycleEvents(ArcRecorder.java:100)
        at io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294.deploy_0(Unknown Source)
        at io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294.deploy(Unknown Source)
        at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
        at io.quarkus.runtime.Application.start(Application.java:101)
        at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:110)
        at io.quarkus.runtime.Quarkus.run(Quarkus.java:70)
        at org.keycloak.quarkus.runtime.KeycloakMain.start(KeycloakMain.java:98)
        at org.keycloak.quarkus.runtime.cli.command.AbstractStartCommand.run(AbstractStartCommand.java:37)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at org.keycloak.quarkus.runtime.cli.Picocli.parseAndRun(Picocli.java:94)
        at org.keycloak.quarkus.runtime.KeycloakMain.main(KeycloakMain.java:88)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:61)
        at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:32)
Caused by: java.util.concurrent.TimeoutException
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:204)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.getOrCreate(CacheManagerFactory.java:59)
        ... 42 more

Configuration:

We use the default cache-ispn.yaml and the configuration works fine on 20.0.5

Version

21.1.1

Expected behavior

Keycloak should start without any infinispan error.

Actual behavior

Keycloak container fails to start due to infinispan error.

How to Reproduce?

Able to reproduce on version 21.1.1 with following config

KC_CACHE_STACK - kubernetes
KC_HEALTH_ENABLED - true
KC_METRICS_ENABLED - true
KC_PROXY - reencrypt

Anything else?

No response

The text was updated successfully, but these errors were encountered:

sschu · 2023-06-21T11:38:36Z

Can you provide exact steps to reproduce?

abstractj · 2023-06-21T20:22:14Z

@kopvortex as mentioned by @sschu providing the steps to reproduce is essential for us to proceed. Otherwise, we may close this issue.

kopvortex · 2023-06-21T21:11:27Z

We have set up Keycloak on EKS (Elastic Kubernetes Service) with the integration of JGroup and Infinispan. To replicate the issue, I initiated a Keycloak deployment and then initiated another deployment before the first one was fully operational. Keycloak container in the second deployment fails with error.

I noticed following in logs with jgroup trace logs enabled.

2023-06-20 01:40:33,821 WARN  [org.jgroups.protocols.TCP] (TQ-Bundler-7,keycloak-7b987d9f68-rtbgz-2491) JGRP000032: keycloak-7b987d9f68-rtbgz-2491: no physical address for e0a9c418-fc6d-4635-89d3-4b74f3a1601f, dropping message                                      
java.lang.NullPointerException
        at org.jgroups.protocols.FD_SOCK2.getPhysicalAddresses(FD_SOCK2.java:445)
        at org.jgroups.protocols.FD_SOCK2.connectTo(FD_SOCK2.java:395)
        at org.jgroups.protocols.FD_SOCK2.connectToNextPingDest(FD_SOCK2.java:376)
        at org.jgroups.protocols.FD_SOCK2.handle(FD_SOCK2.java:347)
        at org.jgroups.protocols.FD_SOCK2.handle(FD_SOCK2.java:31)
        at org.jgroups.util.ProcessingQueue.process(ProcessingQueue.java:55)
        at org.jgroups.util.ProcessingQueue.add(ProcessingQueue.java:35)
        at org.jgroups.protocols.FD_SOCK2.handleView(FD_SOCK2.java:364)
        at org.jgroups.protocols.FD_SOCK2.down(FD_SOCK2.java:227)
        at org.jgroups.protocols.FailureDetection.down(FailureDetection.java:149)
        at org.jgroups.protocols.VERIFY_SUSPECT2.down(VERIFY_SUSPECT2.java:84)
        at org.jgroups.protocols.pbcast.NAKACK2.down(NAKACK2.java:619)
        at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:611)
        at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:260)
        at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:676)
        at org.jgroups.protocols.pbcast.ServerGmsImpl.handleViewChange(ServerGmsImpl.java:66)
        at org.jgroups.protocols.pbcast.ParticipantGmsImpl.handleViewChange(ParticipantGmsImpl.java:112)
        at org.jgroups.protocols.pbcast.GMS.handle(GMS.java:991)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:855)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
        at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:470)
        at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:1010)
        at org.jgroups.protocols.pbcast.NAKACK2.removeAndDeliver(NAKACK2.java:943)
        at org.jgroups.protocols.pbcast.NAKACK2.handleMessageBatch(NAKACK2.java:915)
        at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:747)
        at org.jgroups.protocols.VERIFY_SUSPECT2.up(VERIFY_SUSPECT2.java:119)
        at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:193)
        at org.jgroups.protocols.FD_SOCK2.up(FD_SOCK2.java:202)
        at org.jgroups.protocols.MERGE3.up(MERGE3.java:288)
        at org.jgroups.protocols.Discovery.up(Discovery.java:314)
        at org.jgroups.protocols.RED.up(RED.java:119)
        at org.jgroups.protocols.TP.passBatchUp(TP.java:1204)
        at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.passBatchUp(MaxOneThreadPerSender.java:289)
        at org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:150)
        at org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.run(MaxOneThreadPerSender.java:278)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

Seems related to this https://issues.redhat.com/browse/JGRP-2707

sschu · 2023-06-22T08:10:39Z

Then the solution would be to wait until Keycloak picks up the Infinispan version where this is fixed. Also see #21119 (comment)

abstractj · 2023-07-06T11:46:59Z

@pruivo from my limited knowledge the changes proposed here #21064 can solve this issue. Is that correct?

abstractj · 2023-07-06T11:48:39Z

@stianst added 22 milestone here because this is important for us to include in the upcoming release. When you are back, please also review #21064. I believe that must to be included in 22.

pruivo · 2023-07-06T12:52:57Z

@pruivo from my limited knowledge the changes proposed here #21064 can solve this issue. Is that correct?

It can workaround the "problem" (although I don't believe the NPE causes any issue and the cluster is able to recover).

The user can replace the protocol FD_SOCK2 with FD_SOCK (or remove it). With the change in #21064 the following custom stack can be used:

<jgroups>
   <stack name="my-stack" extends="kubernetes">
      <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
   </stack>
</jgroups>

And set KC_CACHE_STACK="my-stack" (assuming it is running in OpenShift)

ahus1 · 2023-07-21T11:26:01Z

I've been testing @pruivo's workaround together with the Keycloak Operator and found that it needed an additional configuration to make it work, as the DNS discovery hostname wasn't recognized as before.

See keycloak/keycloak-benchmark#440 for the PR in the Keycloak Benchmark project.

Changes to the Infinispan cache configuration configuration file:

    <!-- Workaround for https://github.com/keycloak/keycloak/issues/21092 -->
    <jgroups>
        <stack name="kubernetes-with-fdsock" extends="kubernetes">
            <!-- When using an embedded stack, replacement is done by Infinispan which requires environment variables to be
            prefixed with ".env", while JGroups falls back to environment variables without the prefix.
            The Keycloak Operator passes this information in an enviornment variable `jgroups.dns.query`.
            See https://github.com/keycloak/keycloak/issues/21830 for a discussion.
            -->
            <dns.DNS_PING dns_query="${env.jgroups.dns.query}" />
            <!-- Workaround for problems with FD_SOCK2, which might fail when the other nodes are not ready yet.
            See https://github.com/keycloak/keycloak/issues/21092 -->
            <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
        </stack>
    </jgroups>

And then adding the new cache name in the additional options in the Keycloak CR:

  additionalOptions:
     # Workaround for https://github.com/keycloak/keycloak/issues/21092
    - name: 'cache-stack'
      value: kubernetes-with-fdsock

ahus1 · 2023-07-21T11:30:31Z

One more question towards @kopvortex's comment above:

To replicate the issue, I initiated a Keycloak deployment and then initiated another deployment before the first one was fully operational. Keycloak container in the second deployment fails with error.

When deploying Keycloak with a StatefulSet, the first Pod should be ready before starting the second Pod. Among other reasons as outlined in #11763 (comment), this is one of the reasons why the Keycloak Operator is using a StatefulSet.

Am I correct to assume you're using a Deployment for Keycloak in the scenario you described above?

souravs17031999 · 2023-07-31T16:22:30Z

@ahus1, For me I am using StatefulSets and the scenario is like we have deployed Keycloak in k8s and two pods are running on v21, now we start to upgrade to v22, so first pod which comes up with v22 (keeping one of the older one at v21) starts to throw NPE exceptions and goes to crashLoop continuosly failing the deployment.

sschu · 2023-08-01T07:05:40Z

@souravs17031999 This is is not supported as the Infinispan versions in different Keycloak versions are not compatible. Furthermore, the new Keycloak version might contain database migrations. To do a Keycloak version upgrade, you have to scale to zero pods first and then update.

souravs17031999 · 2023-08-01T09:23:20Z

Ok, thanks @sschu , makes sense.

mabartos · 2023-08-01T09:36:40Z

@ahus1 Thanks for improving and verifying the workaround.

To summarize what's the future of this issue:

We've been discussing it with the maintainers of JGroups and Infinispan, and we will be waiting for JGroups 5.2.17 and Infinispan 14.0.x releases. These releases will contain a fix for this issue. It might be included in the next planned micro-release 22.0.2 of Keycloak, or potentially in the next one.

Jojoooo1 · 2023-08-05T00:46:59Z

@sschu would you mind pointing to any documentation that talk about the upgrade process and the need of scaling down to 0 (especially when using infinispan) ? I tried to find informations on this but was not able to find any. Thanks a lot for the help :)

sschu · 2023-08-07T07:07:46Z

This is not explicitely mentioned. You can infer this from the upgrading guide (https://www.keycloak.org/docs/latest/upgrading/index.html) because this talks about the traditional upgrade process of installed software which implicitly shuts down the software before upgrading it.

ahus1 · 2023-08-07T07:30:07Z

@Jojoooo1 / @sschu - thank you for pointing this out - the docs could be more descriptive on that. If someone would create a PR with some changes, I'd be happy to review and merge it. Thanks!

ahus1 · 2023-08-11T09:13:59Z

@mabartos - I've started a draft PR targeting the latest 14.0.14-SNAPSHOT release, see #22386

Closes keycloak#21092

Closes #21092

Closes keycloak#21092 (cherry picked from commit dfc8c80)

Closes #21092 (cherry picked from commit dfc8c80)

noiter · 2023-08-20T21:48:15Z

I also have been following up a similar issue happened to me. One thing weird to me is that,

no physical address for e0a9c418-fc6d-4635-89d3-4b74f3a1601f, dropping message

The node name looks weird as other node names are like keycloak-* (keycloak-7b987d9f68-rtbgz-2491 in above example). Does anyone know why it?

pruivo · 2023-08-21T09:02:20Z

JGroups identifies nodes using UUID and it has a cache that maps UUID to logical names (keycloak-*) and physical addresses (IP and port). This cache is populated on demand and in your case, it seems the UUID is not present in the cache. The message should go away after a while after the message is retried and the cache is populated with that destination.

If it does not go away, it may be some misconfiguration or some network issue.

noiter · 2023-08-21T11:36:45Z

Another issue I am facing when I am trying to override the kubernetes cache stack (by following suggestions from #21092 (comment) and #21092 (comment)) to work around this issue: two Keycloak instances started successfully, but same log no members discovered after 2003 ms: creating cluster as coordinator got printed for both instances. Looks like the distributed cache nodes failed to discover each other.

My custom cache stack config file is like,

<infinispan ...>
    <jgroups>
        <stack name="kubernetes-with-fdsock" extends="kubernetes">
            <dns.DNS_PING dns_query="${env.jgroups.dns.query}"/>
            <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
        </stack>
    </jgroups>

    <cache-container name="keycloak" statistics="true">
        <transport lock-timeout="60000" stack="kubernetes-with-fdsock"/>
        ... ...
    </cache-container>
</infinispan>

I have also tried
1.

 <jgroups>
    <stack name="kubernetes-with-fdsock" extends="kubernetes">
        <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
    </stack>
</jgroups>

 <jgroups>
    <stack name="kubernetes-with-fdsock" extends="kubernetes">
        <dns.DNS_PING dns_query="${env.jgroups.dns.query}"/>
        <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
    </stack>
</jgroups>

 <jgroups>
    <stack name="kubernetes-with-fdsock" extends="kubernetes">
        <dns.DNS_PING dns_query="keycloak-headless"/>
        <FD_SOCK2 stack.combine="REMOVE"/>
    </stack>
</jgroups>

 <jgroups>
    <stack name="kubernetes-with-fdsock" extends="kubernetes">
        <dns.DNS_PING dns_query="keycloak-headless"/>
        <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
    </stack>
</jgroups>

My Keycloak config is like:

- name: KC_CACHE
  value: ispn
- name: KC_CACHE_CONFIG_FILE
  value: cache-ispn-fdsock.xml
- name: JAVA_OPTS_APPEND
  value: -Djgroups.dns.query=keycloak-headless ...

Keyclock version: 21.1.1
Platform: AWS MKS (k8s)
Way of deployment: StatefulSet

Can someone shed some lights on it? Thanks in advance.

pruivo · 2023-08-21T12:01:36Z

The first one should work but you need to configure the cache-stack.

     # Workaround for https://github.com/keycloak/keycloak/issues/21092
    - name: 'cache-stack'
      value: kubernetes-with-fdsock

Example: keycloak/keycloak-benchmark@880e4ea

noiter · 2023-08-21T12:28:12Z

@pruivo thanks for replying. I also tried it before, just tried it again, but unfortunately I got error below,

2023-08-21T12:26:06.527Z ERROR keycloak-cache-init org.infinispan.CONFIG : ISPN000660: DefaultCacheManager start failed, stopping any running components: org.infinispan.commons.CacheConfigurationException: ISPN000540: No such JGroups stack 'kubernetes-with-fdsock'
	at org.infinispan.configuration.global.JGroupsConfiguration.lambda$configurator$2(JGroupsConfiguration.java:69)
	at java.base/java.util.Optional.orElseThrow(Optional.java:403)
	at org.infinispan.configuration.global.JGroupsConfiguration.configurator(JGroupsConfiguration.java:69)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.buildChannel(JGroupsTransport.java:749)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.initChannel(JGroupsTransport.java:504)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:485)
	at org.infinispan.remoting.transport.jgroups.CorePackageImpl$1.start(CorePackageImpl.java:42)
	at org.infinispan.remoting.transport.jgroups.CorePackageImpl$1.start(CorePackageImpl.java:27)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:616)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:607)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:576)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:807)
	at org.infinispan.metrics.impl.MetricsCollector.start(MetricsCollector.java:78)
	at org.infinispan.metrics.impl.CorePackageImpl$1.start(CorePackageImpl.java:41)
	at org.infinispan.metrics.impl.CorePackageImpl$1.start(CorePackageImpl.java:34)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:616)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:607)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:576)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:807)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.startDependencies(BasicComponentRegistryImpl.java:634)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:598)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:576)
	at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:807)
	at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:379)
	at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:252)
	at org.infinispan.manager.DefaultCacheManager.internalStart(DefaultCacheManager.java:778)
	at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:746)
	at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:410)
	at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.startCacheManager(CacheManagerFactory.java:96)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
: org.infinispan.commons.CacheConfigurationException: ISPN000540: No such JGroups stack 'kubernetes-with-fdsock'
	at org.infinispan.configuration.global.JGroupsConfiguration.lambda$configurator$2(JGroupsConfiguration.java:69)
	at java.base/java.util.Optional.orElseThrow(Optional.java:403)
	at org.infinispan.configuration.global.JGroupsConfiguration.configurator(JGroupsConfiguration.java:69)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.buildChannel(JGroupsTransport.java:749)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.initChannel(JGroupsTransport.java:504)

Keycloak config is as below:

- name: KC_CACHE
  value: ispn
- name: KC_CACHE_STACK
  value: kubernetes-with-fdsock
- name: KC_CACHE_CONFIG_FILE
  value: cache-ispn-fdsock.xml

Any other suggestions?

pruivo · 2023-08-21T12:33:12Z

did you upload cache-ispn-fdsock.xml as a configmap?

noiter · 2023-08-21T12:50:59Z

I am using https://github.com/codecentric/helm-charts/tree/master/charts/keycloakx to do the Keycloak deployment. I don't think it is uploading cache-ispn-fdsock.xml as a configmap, instead simply specifying the cache config file via KC_CACHE_CONFIG_FILE variable. In the meantime, I am copying the cache-ispn-fdsock.xml under keycloak/conf folder when creating a new custom Keycloak image.

I am putting these KC_CACHE variables under extraEnv of the container keycloak of StatefulSet definition.

pruivo · 2023-08-21T13:12:30Z

I'm not familiar with that helm chart.
Make sure /opt/keycloak/conf/cache-ispn-fdsock.xml file exists and it contains the stack with the name kubernetes-with-fdsock.

noiter · 2023-08-21T13:51:59Z

Yeah I also confirmed that but unfortunately it didn't work still.

I've also tried locally with docker-compose, but as long as I specify the KC_CACHE_STACK with the custom stack name, it throws error

ISPN000540: No such JGroups stack 'kubernetes-with-fdsock'

, but such error is never thrown when I leave KC_CACHE_STACK empty but follow the way described https://www.keycloak.org/server/caching#_custom_transport_stacks to configure. But problem with it that Infinispan nodes don't discovery each other. :(

MrDWilson · 2024-04-18T15:00:00Z

@souravs17031999 This is is not supported as the Infinispan versions in different Keycloak versions are not compatible. Furthermore, the new Keycloak version might contain database migrations. To do a Keycloak version upgrade, you have to scale to zero pods first and then update.

I'm experiencing the same issue, and this sorted it for me. However, do we know if this is for major, minor or patch versions? We're going from 22 to 24 when noticing it, but would like to know for future if doing even a patch version would need a scale down first (as this would cause downtime).

ahus1 · 2024-04-18T15:18:12Z

do we know if this is for major, minor or patch versions?

For now, we only support rolling upgrades when you stay on the exact same versions (including the patch level). You usually do this to change startup configurations or memory settings.

We would eventually support rolling upgrades on patch releases. We're currently discussing this. Once we have the right tests in place, and are sure we can guarantee it, we'll add this to the release notes and also to the Keycloak upgrade guide - https://www.keycloak.org/docs/latest/upgrading/index.html#_upgrading

kopvortex added kind/bug Categorizes a PR related to a bug status/triage labels Jun 19, 2023

ghost added area/infinispan team/store labels Jun 19, 2023

sschu mentioned this issue Jun 22, 2023

Facing JGroups NPE issues when running Keycloak in EKS #21119

Closed

2 tasks

PavelVlha assigned martin-kanis Jun 22, 2023

trixpan mentioned this issue Jun 28, 2023

[14.x] ISPN-14994 JGroups 5.2.15.Final infinispan/infinispan#11057

Closed

abstractj added this to the 22.0.0 milestone Jul 6, 2023

pedroigor mentioned this issue Jul 4, 2023

Allow options to support any value in addition to a list of pre-defined values. #21439

Closed

stianst modified the milestones: 22.0.0, 22.0.1, 22.0.2 Jul 10, 2023

trixpan mentioned this issue Jul 17, 2023

Infinispan crashes with NPE when upgrading keycloak running on k8s from 21.1.1 to 21.1.2 or 22.0.0 #21754

Closed

2 tasks

martin-kanis removed their assignment Jul 20, 2023

martin-kanis added team/cloud-native and removed status/triage labels Jul 20, 2023

martin-kanis assigned mabartos Jul 20, 2023

ahus1 mentioned this issue Jul 21, 2023

Run tests Infinispan tests with FDSOCK instead of FD_SOCK2 keycloak/keycloak-benchmark#439

Closed

ryanemerson mentioned this issue Jul 24, 2023

Adding FD_SOCK2 -> FD_SOCK workaround keycloak/keycloak-benchmark#440

Merged

ahus1 mentioned this issue Aug 11, 2023

Upgrade to Infinispan 14.0.14 #22386

Merged

ahus1 added a commit to ahus1/keycloak that referenced this issue Aug 15, 2023

Upgrade to Infinispan 14.0.14

d5797a4

Closes keycloak#21092

mhajas closed this as completed in #22386 Aug 16, 2023

mhajas pushed a commit that referenced this issue Aug 16, 2023

Upgrade to Infinispan 14.0.14 (#22386)

dfc8c80

Closes #21092

ahus1 added a commit to ahus1/keycloak that referenced this issue Aug 16, 2023

Upgrade to Infinispan 14.0.14

ff54baa

Closes keycloak#21092 (cherry picked from commit dfc8c80)

ahus1 mentioned this issue Aug 16, 2023

Upgrade to Infinispan 14.0.14 #22485

Merged

mhajas pushed a commit that referenced this issue Aug 16, 2023

Upgrade to Infinispan 14.0.14 (#22485)

0b18054

Closes #21092 (cherry picked from commit dfc8c80)

adam-carruthers mentioned this issue Jan 17, 2024

Keycloak fails to start due to infinispan BootstrapMethodError when using ec2 cache stack #26258

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keycloak fails to start due to infinispan state transfer exception #21092

Keycloak fails to start due to infinispan state transfer exception #21092

kopvortex commented Jun 19, 2023 •

edited

Loading

sschu commented Jun 21, 2023

abstractj commented Jun 21, 2023

kopvortex commented Jun 21, 2023

sschu commented Jun 22, 2023

abstractj commented Jul 6, 2023 •

edited

Loading

abstractj commented Jul 6, 2023

pruivo commented Jul 6, 2023

ahus1 commented Jul 21, 2023

ahus1 commented Jul 21, 2023

souravs17031999 commented Jul 31, 2023

sschu commented Aug 1, 2023

souravs17031999 commented Aug 1, 2023

mabartos commented Aug 1, 2023

Jojoooo1 commented Aug 5, 2023

sschu commented Aug 7, 2023

ahus1 commented Aug 7, 2023

ahus1 commented Aug 11, 2023

noiter commented Aug 20, 2023 •

edited

Loading

pruivo commented Aug 21, 2023

noiter commented Aug 21, 2023 •

edited

Loading

pruivo commented Aug 21, 2023

noiter commented Aug 21, 2023 •

edited

Loading

pruivo commented Aug 21, 2023

noiter commented Aug 21, 2023 •

edited

Loading

pruivo commented Aug 21, 2023

noiter commented Aug 21, 2023 •

edited

Loading

MrDWilson commented Apr 18, 2024

ahus1 commented Apr 18, 2024

Keycloak fails to start due to infinispan state transfer exception #21092

Keycloak fails to start due to infinispan state transfer exception #21092

Comments

kopvortex commented Jun 19, 2023 • edited Loading

Before reporting an issue

Area

Describe the bug

Version

Expected behavior

Actual behavior

How to Reproduce?

Anything else?

sschu commented Jun 21, 2023

abstractj commented Jun 21, 2023

kopvortex commented Jun 21, 2023

sschu commented Jun 22, 2023

abstractj commented Jul 6, 2023 • edited Loading

abstractj commented Jul 6, 2023

pruivo commented Jul 6, 2023

ahus1 commented Jul 21, 2023

ahus1 commented Jul 21, 2023

souravs17031999 commented Jul 31, 2023

sschu commented Aug 1, 2023

souravs17031999 commented Aug 1, 2023

mabartos commented Aug 1, 2023

Jojoooo1 commented Aug 5, 2023

sschu commented Aug 7, 2023

ahus1 commented Aug 7, 2023

ahus1 commented Aug 11, 2023

noiter commented Aug 20, 2023 • edited Loading

pruivo commented Aug 21, 2023

noiter commented Aug 21, 2023 • edited Loading

pruivo commented Aug 21, 2023

noiter commented Aug 21, 2023 • edited Loading

pruivo commented Aug 21, 2023

noiter commented Aug 21, 2023 • edited Loading

pruivo commented Aug 21, 2023

noiter commented Aug 21, 2023 • edited Loading

MrDWilson commented Apr 18, 2024

ahus1 commented Apr 18, 2024

kopvortex commented Jun 19, 2023 •

edited

Loading

abstractj commented Jul 6, 2023 •

edited

Loading

noiter commented Aug 20, 2023 •

edited

Loading

noiter commented Aug 21, 2023 •

edited

Loading

noiter commented Aug 21, 2023 •

edited

Loading

noiter commented Aug 21, 2023 •

edited

Loading

noiter commented Aug 21, 2023 •

edited

Loading