-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Description
Pre-check
- I am sure that all the content I provide is in English.
Search before asking
- I had searched in the issues and found no similar issues.
Apache Dubbo Component
Java SDK (apache/dubbo)
Dubbo Version
Dubbo java 3.2.15 OpenJDK 21-2024.03 Debian GNU/Linux 11 (bullseye)
Steps to reproduce this issue
1.In high concurrency scenarios, the list of service providers is frequently updated
2.Multiple consumer threads are simultaneously executing routing decisions
3.When the routing chain is updated and the routing decision is executed concurrently, a "Reject to route" error will occur
What you expected to happen
During the release period, requests should be sent to the newly released producer server as usual.
Actual behavior
the monitoring shows that the call volume has decreased to 0.
Here is some log:
[DUBBO] Failed to execute router: nacos://xxxx:8848/org.apache.dubbo.registry.RegistryService?REGISTRY_CLUSTER=default&application=adx-web&check=false&dubbo=2.0.2&executor-management-mode=isolation&file-cache=true&password=xxx!&pid=7&qos.enable=false®ister-mode=instance&release=3.2.15×tamp=1766047589451&username=nacos, cause: reject to route, because the invokers has changed., dubbo version: 3.2.15, current host: xxx, error code: 2-1. This may be caused by , go to https://dubbo.apache.org/faq/2/1 to find instructions.
java.lang.IllegalStateException: reject to route, because the invokers has changed.
at org.apache.dubbo.rpc.cluster.SingleRouterChain.route(SingleRouterChain.java:147)
at org.apache.dubbo.registry.integration.DynamicDirectory.doList(DynamicDirectory.java:214)
at org.apache.dubbo.rpc.cluster.directory.AbstractDirectory.list(AbstractDirectory.java:235)
at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.list(AbstractClusterInvoker.java:452)
at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:355)
at org.apache.dubbo.rpc.cluster.router.RouterSnapshotFilter.invoke(RouterSnapshotFilter.java:46)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
at org.apache.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:108)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
at org.apache.dubbo.rpc.cluster.filter.support.MetricsClusterFilter.invoke(MetricsClusterFilter.java:57)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
at org.apache.dubbo.rpc.protocol.dubbo.filter.FutureFilter.invoke(FutureFilter.java:52)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
at org.apache.dubbo.rpc.cluster.filter.support.ObservationSenderFilter.invoke(ObservationSenderFilter.java:62)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
at org.apache.dubbo.spring.security.filter.ContextHolderParametersSelectedTransferFilter.invoke(ContextHolderParametersSelectedTransferFilter.java:40)
at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
[DUBBO] No provider available after connectivity filter for the service cn.geo.api.remoteservice.RemoteIpAreaService All routed invokers' size: 0 from registry ServiceDiscoveryRegistryDirectory(registry: xxxx:8848, subscribed key: [geo-service])-Directory(invokers: 130[xxxx:20880, xxxx:20880, xxxx:20880], validInvokers: 130[xxxx:20880, xxxx:20880, xxxx:20880], invokersToReconnect: 0[]) on the consumer xxxx using the dubbo version 3.2.15., dubbo version: 3.2.15, current host: xxxx, error code: 2-2. This may be caused by provider server or registry center crashed, go to https://dubbo.apache.org/faq/2/2 to find instructions.
Monitor as above. Restart the consumer cluster and call recovery.
I think there is a concurrency issue with the invokers variable in the org. Apache. Dubbo. rpc. cluster. SingleRouterChain # route method, which resulted in the 140 line judgment being true and throwing throw new IllegalStateException("reject to route, because the invokers has changed.");
public List<Invoker<T>> route(URL url, BitList<Invoker<T>> availableInvokers, Invocation invocation) {
if (invokers.getOriginList() != availableInvokers.getOriginList()) {
logger.error(
INTERNAL_ERROR,
"",
"Router's invoker size: " + invokers.getOriginList().size() + " Invocation's invoker size: "
+ availableInvokers.getOriginList().size(),
"Reject to route, because the invokers has changed.");
throw new IllegalStateException("reject to route, because the invokers has changed.");
}
if (RpcContext.getServiceContext().isNeedPrintRouterSnapshot()) {
return routeAndPrint(url, availableInvokers, invocation);
} else {
return simpleRoute(url, availableInvokers, invocation);
}
}Anything else
I think this issue should be inevitable during high-frequency calls.
Root cause analysis:
- The data consistency of available Invoke is guaranteed by invokerRefreshLock
- The lock of routerChain is mainly used for switching and selecting routing chains
- But the protection ranges of these two locks are different, resulting in inconsistent states during concurrent updates
Current design issue:
- The read lock of routerChain is released prematurely after obtaining singleChain
- The routing decision (doList) is made after the routerChain read lock is released
- This results in routerChain being modifiable by other threads during route execution
Are you willing to submit a pull request to fix on your own?
- Yes I am willing to submit a pull request on my own!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status