-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds-client acknowledge race condition #4703
Comments
This is not a race in gRPC. This is an issue in the go-control-plane. In gRPC-Go, we use the go-control-plane as our management server in our e2e tests, and we have seen the same issue. The way we work around this issue is to make sure that the management server has all resources before the gRPC client starts querying for a subset of them. We had filed some bugs on the go-control-plane in the past, but there hasn't been a fix yet. Please let us know if you still think the issue is in gRPC. |
@easwars XDS control plane already knows all resources at the moment. GRPC client requests for a single resource (see logs) and almost immediately sends another discovery requests for two resources. XDS responds back with single resource on the first request and didn't respond back on second request because GRPC client ACK the first response. For me it looks like GRPC client shouldn't ACK the first response because it contains only 1 resource when GRPC client knows that it has requested a discovery for two. Please let me know if I am missing something there ? I have ads flag set to false since all my GRPC clients cannot know all services (and they don't need to know) |
I still think gRPC is behaving correctly here. From the logs, this is the sequence of events:
See https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#knowing-when-a-requested-resource-does-not-exist, which talks about why gRPC cannot NACK the above response because it cannot confidently say that the server does not have have
What do you mean by this? gRPC only support ADS. |
@easwars Just to clarify - does the log below means that GRPC requests the resources ? I thought it is ACK. I see in xDS logs that xDS server creates a watch upon receiving the request .
xDS logs
ads flag in NewSnapshotCache |
ACK's are piggybacked. A client is expected to ACK or NACK every response that it receives from the server. When the version number matches the version number of the response sent by the server, it is considered an ACK. See: https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#ack
I think the
We set to it to |
I see. So the control plane ignores somehow the second discoveryRequest containing responseNonce in my case and just creates a watch. I will dig more there. Thank you for the confirmation. You are probably right that it is not grpc client issue.
If you set the flag = true it means every GRPC client needs to know all available services registered on xDS server regardless if the client needs it or not. I use xDS for service discovery and grpc clients communicate to only services they need. Requirement to list all services kinda defeats the purpose of discovery. Also I can confirm if I set ads flag to true and force grpc client to send a discovery for all resources, everything works as expected on my setup. But it doesn't work if a client needs only one or couple resources of the full set. xDS control plane ignores such requests. |
If you end up opening an issue with the go-control-plane, I'd appreciate it you can tag me there. Just curious to see how this shapes up. Thanks. |
Looks like there is a race-condition that leads a partial services discovery by GRPC client. I see the same issue on discovering RouteConfiguration, Cluster or ClusterLoadAssignment
An example of race during ClusterLoadAssignment request - grpc.log
What version of gRPC are you using?
v1.40.0
What version of Go are you using (
go version
)?1.16.7
What operating system (Linux, Windows, …) and version?
Linux
What did you do?
I use envoy/go-control-plane to build xDS server. GRPC clients connect to it to discover endpoints. Everything works fine if there is only one GRPC client. When I need to have more than one GRPC client (different destinations) within same application I see the issue.
What did you expect to see?
GRPC should complete the discovery when multiple clients are created
What did you see instead?
GRPC doesn't finish the discovery due to a race (I think it is a race).
Here is what I think is going on
The text was updated successfully, but these errors were encountered: