-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: use new topo named locks and TTL override for workflow coordination #16260
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
25e6572
to
f0cd253
Compare
And use that for VReplication workflows when coordination is necessary, such as between the VReplicaiton engine and the VDiff engine. Signed-off-by: Matt Lord <mattalord@gmail.com>
f0cd253
to
e2e58c9
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16260 +/- ##
==========================================
- Coverage 68.71% 68.69% -0.02%
==========================================
Files 1547 1548 +1
Lines 198287 198444 +157
==========================================
+ Hits 136243 136330 +87
- Misses 62044 62114 +70 ☔ View full report in Codecov by Sentry. |
06a766b
to
6576504
Compare
6576504
to
65e806f
Compare
49747a4
to
ee7ff15
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
b3ffc7e
to
0809dad
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
8a63b31
to
321c4f0
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
321c4f0
to
90143d5
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
2338a6d
to
f4de475
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
e288f34
to
c284f6a
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Signed-off-by: Matt Lord <mattalord@gmail.com>
Description
The VReplication and VDiff engines need to coordinate on workflows as the
TableDiffer
manipulates the associated workflow record (in the_vt.vreplication
table) on its shard in order to initialize the diff (stopping the workflow, syncing up streams and snapshot positions, before restarting the workflow). In order to do so properly — meaning w/o impacting unrelated operations and thus able to use a lock that will not expire/be lost while work is in progress — they need a distributed lock in the topo server. Workflow information is, however, not stored in the topo server so there's no related record to lock. To address this gap, this PR adds support for named locks in the topo server. These are locks on an opaque name rather than a topo record/key. We then leverage this to lock the workflow using the unique name oftargetkeyspace/workflowname
.Here's an example of their usage when running a vdiff with the local examples where the target keyspace is
customer
(which has 2 shards) and the workflow iscommerce2customer
:There remains a general issue that the Keyspace locks taken during traffic switches — where we update various records in the topo related to routing rules, shard records, etc — can be lost after 30 seconds (by default, for etcd2topo). This PR also addresses this by adding a mechanism in the topo service,
LockWithTTL
, to override the default lock TTL for the topo implementation:--topo_etcd_lease_ttl
flag--topo_consul_lock_session_ttl
flagWe also now check to confirm that we are holding the locks between major operations.
Please see the RFC for additional details: #16269
Related Issue(s)
Checklist