Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block replication and query RPC calls until wait for dba grants has completed #14836

Merged
merged 9 commits into from
Jan 9, 2024

Conversation

GuptaManan100
Copy link
Member

Description

This PR tries to fix the issue pointed out in #14834.
This is an extension of #14565 and #14680.

The proposed fix in this PR is to block all the replication and query based RPC calls from running until the waitForDBAGrants has succeeded. To this end, we have introduced a new member in the tabletmanager struct which is a channel that we use to know if the function call has finished or not.

Since waitForDBAGrants was a function only being used in the tabletmanager package, we have moved it there and made it a local function. This is also good because it now needs access to the newly added channel variable.

This change ensures that we have verified and waited for the users to have the correct grants when we start running replication and query RPC calls.

This change has the added benefit of not re-introducing the problem in #14681

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

… has succeeded

Signed-off-by: Manan Gupta <manan@planetscale.com>
Copy link
Contributor

vitess-bot bot commented Dec 20, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Dec 20, 2023
@github-actions github-actions bot added this to the v19.0.0 milestone Dec 20, 2023
@GuptaManan100 GuptaManan100 removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Dec 20, 2023
…errors

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about this solution. Would it make sense to handle/wait for this initialization work here?

// NewPool creates a new Pool. The name is used
// to publish stats only.
func NewPool(env tabletenv.Env, name string, cfg tabletenv.ConnPoolConfig) *Pool {
cp := &Pool{
timeout: cfg.Timeout,
env: env,
}
config := smartconnpool.Config[*Conn]{
Capacity: int64(cfg.Size),
IdleTimeout: cfg.IdleTimeout,
MaxLifetime: cfg.MaxLifetime,
RefreshInterval: mysqlctl.PoolDynamicHostnameResolution,
}
if name != "" {
config.LogWait = func(start time.Time) {
env.Stats().WaitTimings.Record(name+"ResourceWaitTime", start)
}
cp.getConnTime = env.Exporter().NewTimings(name+"GetConnTime", "Tracks the amount of time it takes to get a connection", "Settings")
}
cp.ConnPool = smartconnpool.NewPool(&config)
cp.ConnPool.RegisterStats(env.Exporter(), name)
cp.dbaPool = dbconnpool.NewConnectionPool("", env.Exporter(), 1, config.IdleTimeout, config.MaxLifetime, 0)
return cp
}

go/vt/vttablet/tabletmanager/tm_init.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletmanager/tm_init.go Show resolved Hide resolved
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100
Copy link
Member Author

The thing with NewPool is that it just initializes the configurations of the pool and doesn't actually create a connection. So, you can create the pool without needing to wait for anything. We need to wait when the connections are created. That's why I added it to the RPCs. But I agree with you @mattlord, I am just as apprehensive of this solution. If there is a better alternate, I would love to change this implmentation.

Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100 GuptaManan100 requested a review from ajm188 as a code owner January 3, 2024 11:27
@GuptaManan100 GuptaManan100 force-pushed the stall-rpcs-until-wait branch from 5c3af79 to 48bddab Compare January 5, 2024 09:15
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Copy link
Contributor

@dbussink dbussink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I think the alternatives considered are worse than this solution. Guess the least terrible solution for now.

@harshit-gangal harshit-gangal merged commit ae7d3b3 into vitessio:main Jan 9, 2024
99 checks passed
@harshit-gangal harshit-gangal deleted the stall-rpcs-until-wait branch January 9, 2024 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: RPCs to a new started VTTablet sometimes fail
5 participants