Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: cutOverVReplMigration should check lock_wait_timeout otherwise it will fail if user set to low value #16591

Closed
jwangace opened this issue Aug 13, 2024 · 2 comments · Fixed by #16601
Assignees
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Bug

Comments

@jwangace
Copy link
Contributor

jwangace commented Aug 13, 2024

Overview of the Issue

cutOverVReplMigration function works on top of steps:

  • 1: lock the onlineddl table before cut over
  • 2: call RENAME TABLE...
  • 3: check RENAME in the processlist, twice consecutively
  • 4: unlock the onlineddl table
  • 5: let RENAME execute

However, it expect that it has to lock the table for vreplicationCutOverThreshold seconds in V16, or migrationCutOverThreshold seconds in latest code.

But if user had set lock_wait_timeout to a low number, 1 for example, cutOverVReplMigration is guaranteed to fail according to our tests.

mysql> SHOW VARIABLES LIKE 'lock_wait_timeout';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| lock_wait_timeout | 1     |
+-------------------+-------+
1 row in set (0.05 sec)

So in this line, before cutOverVReplMigration tries to lock, it has to check lock_wait_timeout, and bump this value temporary in order to finish this cut over operation.

It could be on global or in session level:

mysql> SET @@SESSION.lock_wait_timeout = 100;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'lock_wait_timeout';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| lock_wait_timeout | 100   |
+-------------------+-------+
1 row in set (0.00 sec)

Reproduction Steps

  1. set lock_wait_timeout to 1 by passing it as mysqld flags
  2. run onlineddl migrations when the cluster is ready
  3. observe the output

Binary Version

We tested in our custom build branch, but it could be all versions

Version: 16.0.3-SNAPSHOT (Git revision 4335eaf8ce3fa328aacd36e66f4776bd5208c7c8 branch 'v16-hc-demonware') built on Tue Dec 12 18:02:03 UTC 2023 by vitess@buildkitsandbox using go1.20.5 linux/amd64

Operating System and Environment details

Running in Kubernetes with vitess-operator v2.9.5

Log Fragments

mysql> show vitess_migrations like '......' will have this message:

message: timeout for rename query: RENAME TABLE ... (more output removed)
@jwangace jwangace added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Aug 13, 2024
@GuptaManan100 GuptaManan100 added Component: VReplication and removed Needs Triage This issue needs to be correctly labelled and triaged labels Aug 14, 2024
@deepthi deepthi added Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) and removed Component: VReplication labels Aug 14, 2024
@shlomi-noach shlomi-noach self-assigned this Aug 15, 2024
@shlomi-noach
Copy link
Contributor

Thank you for submitting this issue. I'm in favor of the suggested approach (increasing the value for all participating sessions, restoring to original at the end). Anecdotally, this is what gh-ost does:

https://github.com/openark/gh-ost/blob/e7d9342f61f4af8e412d2649abe999dc7eb16990/go/logic/applier.go#L980-L986

@shlomi-noach
Copy link
Contributor

Addressed by #16601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants