Skip to content

Conversation

@eshitachandwani
Copy link
Member

@eshitachandwani eshitachandwani commented Nov 17, 2025

This change is part of A74 implementation.

This PR removes the listener and route watchers from resolver and changes it so that we get the resources from xds dependency manager.

RELEASE NOTES: N/A

@eshitachandwani eshitachandwani added Type: Internal Cleanup Refactors, etc Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Nov 17, 2025
@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

❌ Patch coverage is 81.39535% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.19%. Comparing base (08d0792) to head (55e666b).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
internal/xds/resolver/xds_resolver.go 81.39% 5 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8711      +/-   ##
==========================================
- Coverage   83.31%   83.19%   -0.12%     
==========================================
  Files         419      418       -1     
  Lines       32429    32352      -77     
==========================================
- Hits        27017    26916     -101     
- Misses       4038     4052      +14     
- Partials     1374     1384      +10     
Files with missing lines Coverage Δ
internal/xds/resolver/xds_resolver.go 88.57% <81.39%> (+6.14%) ⬆️

... and 28 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@eshitachandwani eshitachandwani added this to the 1.78 Release milestone Nov 17, 2025
@eshitachandwani eshitachandwani requested review from arjan-bal and easwars and removed request for easwars November 17, 2025 03:27
@eshitachandwani eshitachandwani changed the title internal:xds change xds_resolver to use dependency manager internal/xds: change xds_resolver to use dependency manager Nov 17, 2025
@easwars
Copy link
Contributor

easwars commented Nov 17, 2025

Is the PR description out of date? I don't see any changes about switching the dependency manager to use the serializer again.

}
}

// Tests the case where a resource, present in cache, returned by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to continue to test this scenario here? Or is this completely covered by tests in the dependency manager? I guess if there is an ambient error (with the dependency manager in the picture), that error is swallowed by it and the resolver does not see it.

So, maybe we need a test to ensure that the resolver does not see an update in this case? And that RPCs continue to succeed (if they were succeeding earlier)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test in the dependency manager is also mostly similar. Main difference would be the test in dependency managers uses the New function , and the test here will start with the resolver. Either way , adding it does no harm , so re- added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure now what test this was now.

Either way , adding it does no harm , so re- added.

That is not always true. There is no point in having two tests that do exactly the same thing. It's a maintenance burden. So, I'm guessing there was a valid reason to retain that test.

@easwars easwars assigned eshitachandwani and unassigned easwars and arjan-bal Nov 17, 2025
@eshitachandwani
Copy link
Member Author

Is the PR description out of date? I don't see any changes about switching the dependency manager to use the serializer again.

Yeah sorry , I was making that change earlier but Arjan suggested a different way, so that got changed.

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo minor nits

Comment on lines 135 to 152
r.serializer = grpcsync.NewCallbackSerializer(ctx)
r.serializerCancel = cancel

// Initialize the xDS client.
newXDSClient := rinternal.NewXDSClient.(func(string, estats.MetricsRecorder) (xdsclient.XDSClient, func(), error))
if b.newXDSClient != nil {
newXDSClient = b.newXDSClient
}
client, closeFn, err := newXDSClient(target.String(), opts.MetricsRecorder)
if err != nil {
return nil, fmt.Errorf("xds: failed to create xds-client: %v", err)
}
r.xdsClient = client
r.xdsClientClose = closeFn

// Determine the listener resource name and start a watcher for it.
template, err := r.sanityChecksOnBootstrapConfig(target, opts, r.xdsClient)
if err != nil {
return nil, err
}
r.dataplaneAuthority = opts.Authority
r.ldsResourceName = bootstrap.PopulateResourceTemplate(template, target.Endpoint())
r.listenerWatcher = newListenerWatcher(r.ldsResourceName, r)
r.dm = xdsdepmgr.New(r.ldsResourceName, opts.Authority, r.xdsClient, r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not make these three fields also be part of the literal struct initialization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

logger *grpclog.PrefixLogger
ldsResourceName string
dm *xdsdepmgr.DependencyManager
// The underlying xdsClient which performs all xDS requests and responses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was probably added way back when we were still getting used to what the xDS client was. I think we can remove it now. Doesn't add much value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 215 to 216
// A random number which uniquely identifies the channel which owns this
// resolver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making this a trailing line comment that looks something like Unique random ID for the owning channel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@easwars easwars assigned eshitachandwani and unassigned easwars Nov 25, 2025
@eshitachandwani eshitachandwani removed their assignment Nov 27, 2025
// Returns the listener resource name template to use. If any of the above
// validations fail, a non-nil error is returned.
func (r *xdsResolver) sanityChecksOnBootstrapConfig(target resolver.Target, _ resolver.BuildOptions, client xdsclient.XDSClient) (string, error) {
func sanityChecksOnBootstrapConfig(target resolver.Target, _ resolver.BuildOptions, client xdsclient.XDSClient) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If the _ resolver.BuildOptions is unused, can it be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines -490 to -492
func (r *xdsResolver) onAmbientError(err error) {
r.cc.ReportError(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier ambient errors would result in a call to cc.ReportError, not they're only logged. Is this change intentional? Should there be a release note for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the change is intentional. I don't think the behavior would change , because even in current scenario, if we have the resources, we will continue using it and just log it. And so will we. So I don't think this warrants a release note.
WGYT @easwars ?

Comment on lines -471 to -477
if matchVh == nil {
// TODO(purnesh42h): Should this be a resource or ambient error? Note
// that its being called only from resource update methods when we have
// finished removing the previous update.
r.onAmbientError(fmt.Errorf("no matching virtual host found for %q", r.dataplaneAuthority))
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier this was considered an Ambient error, now it's a resource error. Is this intended? Should we add a release note for this behaviour change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a test since this behaviour change didn't cause any existing tests to fail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intended behavior as the gRFC states that this should be considered as resource error : https://github.com/grpc/proposal/blob/f8dc9bebb97325086d963d46756094cc1029724e/A74-xds-config-tears.md?plain=1#L201C1-L216C44

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test in the dependency manager for it , but added in the resolver too.

}},
}

// Expect an error update from the resolver. Since the resource is cached,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment probably needs to be updated since the resolver doesn't report an error for ambient listener and route errors anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Internal Cleanup Refactors, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants