-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: Gather source positions once we know all writes are done during traffic switch #16572
Conversation
Signed-off-by: Matt Lord <mattalord@gmail.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16572 +/- ##
==========================================
- Coverage 68.84% 68.83% -0.02%
==========================================
Files 1558 1558
Lines 200025 200048 +23
==========================================
- Hits 137714 137694 -20
- Misses 62311 62354 +43 ☔ View full report in Codecov by Sentry. |
0e100c7
to
2f8f73d
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
2f8f73d
to
a76bf9a
Compare
a0a027f
to
dbb1ee5
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
f8675c1
to
8b030db
Compare
8b030db
to
fcdc8bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you amend the last paragraph of the PR description, given that you have included a unit test?
Rest LGTM
I think this issue may have been introduced by #13015, which is why many people have not run into this yet, as it only affects v18 and later. I didn't dig into this in much detail at all (and I could be totally wrong), but if I'm right, is this PR worth backporting? |
@GrahamCampbell no, this aspect is the same in the
Exactly nobody has reported it. It's not that "many people have not". The issue reported was from going through the code for the distributed transactions work, not because somebody encountered a problem. |
Description
This PR addresses a logic bug when switching traffic:
LOCK TABLES
rounds to ensure that there are no in flight writes on the tables we're migrating, then we ensure that the migration targets' replication position is at least where the migration source was.I have not been able to trigger any actual bug/issue (lost writes during a traffic switch) through extensive manual testing and nobody has seen or reported an issue around this, but it's clearly a logical bug.
We address this here in a way that causes no additional work or overhead by simply moving the position gathering work from the
stopSourceWrites
step/function to its own step/function that we call immediately before thewaitForCatchup
step/function.Related Issue(s)
Checklist