Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vtgateproxy: sticky random balancer and remote zone backup connections #593

Open
wants to merge 6 commits into
base: vtgateproxy-15
Choose a base branch
from

Conversation

demmer
Copy link
Collaborator

@demmer demmer commented Jan 23, 2025

Description

Implement a couple improvements to the vtgateproxy to support:

  • New sticky_random load balancer which implements round robin over connections while maintaining the mapping from a given webapp mysql connection to a given downstream GRPC.
  • Add support to discovery of a new --num_backup_connections option to include of vtgates in remote zones
  • Respect the zone locality in the picker so that we bias towards local connections if available and only fall back if needed.

Testing

Verified in dev by forcing a webapp host to use a hand-built proxy running on localhost

Running the client and looking at /channelz/ shows that sticky_random only uses the local conns when they're available.

Also verified that it's now picking one backup conn by including that in the debug output:
image

Make it available to the picker layer along with the pool type. Also rework how
the locality is computed to do it as part of building the target host rather
than deferring the comparisons to the sorter.
Using the zone local attribute inhjected by discovery (if it exists), update
sticky_random so that it biases to only use the local zone connections if there
are any available, otherwise fall back to remote.
@demmer demmer requested a review from a team as a code owner January 23, 2025 17:19
@github-actions github-actions bot added this to the v15.0.5 milestone Jan 23, 2025
Copy link

@henryr henryr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, but just check my thinking on the local / non-local host selection during discovery.

// try to pick numConnections from the front of the list (local zone) and numBackupConnections
// from the tail (remote zone). if that's not possible, just take the whole set
if len(allTargets[poolType]) >= b.numConnections+b.numBackupConns {
remoteOffset := len(allTargets[poolType]) - b.numBackupConns
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's cleaner to have the sorter return two separate slices, one for local, one for not. (Or to call shuffleSort twice on two slices).

If allTargets has more local connections than numConnections, but fewer non-local than numBackupConns, we'll put some local connections in the non-local bucket I think. And vice-versa.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a bit of time thinking about this when writing the PR and then again after your comment. While I agree that splitting into two lists would make some stuff clearer, I actually think the way it's written produces the better outcome in all the ways we'd actually want in practice.

So to elaborate a bit, it's clear that we're fine in either the "happy path" when there are enough conns of each flavor, and also in the very degenerate case when there are fewer conns total than numConnections + numBackup and you have to take all of what you got.

This means we're in a case where there more than enough conns total, but not enough of either local or remote type. However I think the better outcome in that case is to still establish numConnections + numBackupConns total connections, even if you end up with the "wrong" mix of local / remote because there aren't enough to satisfy the ask.

If we all agree, then I think the current code does exactly that.

If not, then we're saying we'd rather make sure to only connect to <= numConnections local and <= numBackupConns remote, then I agree with you sorting into two lists would be a clearer way to achieve that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants