Skip to content

xds: reuse GrpcXdsTransport and channel to the same xDS server by ref-counting#12682

Draft
danielzhaotongliu wants to merge 1 commit intogrpc:masterfrom
danielzhaotongliu:ref-count-grpc-xds-channel
Draft

xds: reuse GrpcXdsTransport and channel to the same xDS server by ref-counting#12682
danielzhaotongliu wants to merge 1 commit intogrpc:masterfrom
danielzhaotongliu:ref-count-grpc-xds-channel

Conversation

@danielzhaotongliu
Copy link
Collaborator

@danielzhaotongliu danielzhaotongliu commented Mar 10, 2026

This PR implements reusing the gRPC xDS transport (and underlying gRPC channel) to the same xDS server by ref-counting, which is already implemented in gRPC C++ (link) and gRPC Go (link). This optimization is expected to reduce memory footprint of the xDS management server and xDS enabled clients as channel establishment and lifecycle management of the connection is expensive.

  • Implemented a map to store GrpcXdsTransport instances keyed by the Bootstrapper.ServerInfo and each GrpcXdsTransport has a ref count. Note, the map cannot be simply keyed by the xDS server address as the client could have different channel credentials to the same xDS server, which should be counted as different transport instances.
  • When GrpcXdsTransportFactory.create() is called, the existing transport is reused if it already exists in the map and increment its ref count, otherwise create a new transport, store it in the map, and increment its ref count.
  • When GrpcXdsTransport.shutdown() is called, its ref count is decremented and the underlying gRPC channel is shut down when its ref count reaches zero.
  • Note this ref-counting of the GrpcXdsTransport is different and orthogonal to the ref-counting of the xDS client keyed by the xDS server target name to allow for xDS-based fallback per gRFC A71.

Prod risk level: Low

  • Reusing the underlying gRPC channel to the xDS server would not affect the gRPC xDS streams which would be multiplexed on the same channel, however, this means streams and RPCs may fail due to hitting the limit of MAX_CONCURRENT_STREAMS.

Tested:

  • Verified end-to-end with a xDS enabled gRPC Java client communicating with a gRPC backend server using the xDS management server for name resolution and endpoint discovery.

Implementation details / context:

  • Used java.util.concurrent.ConcurrentHashMap APIs compute and computeIfPresent where the entire method invocation is performed atomically to achieve a concurrent and thread-safe solution which follows Java best practices.

Alternatives considered:

  • Write own synchronization logic with synchronized block and locks. After discussion internally, it was preferred to use existing concurrency libraries which is less error-prone and should offer better performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant