-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Binlog Timestamp Watermarking for VStream #16477
Comments
Let me know if my understanding is correct, based on this issue and the discussion in https://vitess.slack.com/archives/C0PQY0PTK/p1721154609745629:
Does this meet the requirements outlined in this issue or am I missing something? |
@rohit-nayak-ps Yes, that approach will completely satisfy the requirements |
@twthorn, thanks, will update here once I have a working PR so that you test and validate the approach. |
Hi @rohit-nayak-ps wanted to check in on this. Anything I can do to help? Please let me know. Thanks again for your work on this issue! |
It is WIP: I will have a draft PR for it in a day or two. I would request you to test it locally then to make sure it matches the expectations before we get it reviewed. |
I ran this locally with the following diff diff --git a/examples/common/scripts/vttablet-up.sh b/examples/common/scripts/vttablet-up.sh
index daa40aee89..03c54038d0 100755
--- a/examples/common/scripts/vttablet-up.sh
+++ b/examples/common/scripts/vttablet-up.sh
@@ -51,9 +51,10 @@ vttablet \
--restore_from_backup \
--port $port \
--grpc_port $grpc_port \
+ --heartbeat_enable \
+ --heartbeat_interval 1s \
--service_map 'grpc-queryservice,grpc-tabletmanager,grpc-updatestream' \
--pid_file $VTDATAROOT/$tablet_dir/vttablet.pid \
- --heartbeat_on_demand_duration=5s \
> $VTDATAROOT/$tablet_dir/vttablet.out 2>&1 &
# Block waiting for the tablet to be listening
diff --git a/examples/local/vstream_client.go b/examples/local/vstream_client.go
index 98d2129f89..7dd7005807 100644
--- a/examples/local/vstream_client.go
+++ b/examples/local/vstream_client.go
@@ -75,6 +75,7 @@ func main() {
flags := &vtgatepb.VStreamFlags{
//MinimizeSkew: false,
//HeartbeatInterval: 60, //seconds
+ StreamKeyspaceHeartbeats: true,
}
reader, err := conn.VStream(ctx, topodatapb.TabletType_PRIMARY, vgtid, filter, flags)
for { And taking the following steps
Got the following sample of heartbeat events
The heartbeat events for _vt.heartbeat are received periodically for the table. They have the expected BEGIN/FIELD/ROW/COMMIT format. The ROW events contain both the timestamp of the row and the time that the change was applied in the database. Events are received for all shards of the keyspace. With this, any VStream client can establish a generic binlog watermarking strategy for all shards of a keyspace. @rohit-nayak-ps confirming this meets our requirements. Thank you again for the help on this |
OK, great, thanks for testing this out. I will work further on the PR tomorrow: need to add more tests probably and clean it up a bit before marking it ready for review. |
Enabling heartbeats on vttablets will mean that a binlog entry is generated for each interval. The interval defaults to 1s, and that is also what the proposed PR was tested with.
|
@deepthi If we had the option for vstream clients, the desired interval we would set is 1 minute. But 1 second is also acceptable (it would just mean more data than is necessary for accurate watermarking). |
Thank you all for the help on this, really appreciate it! |
Feature Description
Watermarking
Provide an option for specifying a time interval that enables sending periodic per-shard binlog events that indicates that all binlogs for that shard up to timestamp
t
have been received by the VStream client. Send this event regularly based on the interval and regardless of lag or throttling.Note
The ideal case is that these watermark events are generated by inserts into a table, the same as the other tables the client is tracking changes for. This provides a stronger guarantee of the correctness, since it using the same code path we are trying to verify has processed up to a timestamp
t
. It is acceptable if they are received simply as table change events for a specially named table (the client can handle these events in a custom way).Use Case(s)
There are a two common parts of a CDC pipeline using VStreams:
For processing the write to the data store, typically there is an async system for merging together the change log records to create a consistent snapshot of the database. This can be done on a best effort basis (whatever data is available, merge it in). However, other use cases require that the data be complete for a time window, from
t0
(lower bound) tot1
(upper bound) (e.g., one day). In this case, the downstream system creating this database snapshot for a complete time window needs a way to know that the upstream system has processed up to timestampt1
(upper bound of the time window).With this feature, the following would be done:
The downstream system can then check the latest timestamp of each shard's watermark event in the queue. If all timestamps for all shards are >=
t1
, then we know that the VStream client has processed all binlogs up to timestampt1
, and the snapshot job for the time window can be started.The text was updated successfully, but these errors were encountered: