Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add baseline resync read part in homeobject. #204

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

sanebay
Copy link
Contributor

@sanebay sanebay commented Aug 26, 2024

Add read_snapshot_data to go over all shards and blobs of a PG. If obj_id is zero, send all shards. obj_id is concatenation of blob sequence number and batch number. For all other values of obj_id, we send batch of blobs for a shard. Once all blobs are finished in a shard, we move to next shard_id, and batch_num is reset to 0. Add LSN in shard metadata to ignore all reads of shards which are created later that the snapshot LSN.

Added temporary code to write blobs and metadata, tested with SM long running test to create a baseline resync with follower.
Tested with UT.

@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 41.71123% with 109 lines in your changes missing coverage. Please review.

Project coverage is 66.76%. Comparing base (acb04e8) to head (7fe21e8).
Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
...ib/homestore_backend/replication_state_machine.cpp 3.75% 77 Missing ⚠️
src/lib/homestore_backend/pg_blob_iterator.cpp 68.49% 19 Missing and 4 partials ⚠️
src/lib/homestore_backend/index_kv.cpp 79.16% 4 Missing and 1 partial ⚠️
src/lib/homestore_backend/hs_shard_manager.cpp 50.00% 2 Missing ⚠️
src/lib/homestore_backend/hs_blob_manager.cpp 75.00% 1 Missing ⚠️
src/lib/pg_manager.cpp 0.00% 0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #204      +/-   ##
==========================================
- Coverage   68.69%   66.76%   -1.93%     
==========================================
  Files          30       32       +2     
  Lines        1581     1724     +143     
  Branches      163      185      +22     
==========================================
+ Hits         1086     1151      +65     
- Misses        408      480      +72     
- Partials       87       93       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@szmyd szmyd linked an issue Aug 27, 2024 that may be closed by this pull request
@szmyd szmyd added this to the MileStone4.3 milestone Aug 27, 2024
Copy link
Contributor

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a bug for reader size. writer side code is ignored.

src/lib/homestore_backend/pg_blob_iterator.cpp Outdated Show resolved Hide resolved
@sanebay sanebay force-pushed the baseline_resync_read branch 2 times, most recently from 88b6b7c to da98f06 Compare September 6, 2024 03:52
xiaoxichen
xiaoxichen previously approved these changes Sep 6, 2024
Copy link
Collaborator

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late review @sanebay , pls take a look at my comments and correct if i am wrong

src/lib/homestore_backend/replication_state_machine.cpp Outdated Show resolved Hide resolved
auto& index_results_vec = r.value();
for (auto& info : index_results_vec) {
if (info.pbas == HSHomeObject::tombstone_pbas) {
// Skip deleted blobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure can we do it like this?

suppose leader has {1,1} to {5,5}, and {5,7} to {10,10}({shard_id, blob_id}) , last lsn is 120, {5,6} is deleted at lsn 100. the log at leader has been compacted to 110.
follower has {1,1} to {6.0} and the last lsn is 80 , then if baseline resync occurs, the follower will never konw that {5,6} has been deleted since it is not aware fo lsn 100.

so I think here we should set some special data for the tombstone_pbas in the blob_info_vec which will be send to follower , so that the follower can identify that this blob is deleted.

another question, if GC happens and tombstone is also removed, so how can leader let the follower know this info when baseline resync happens.

pls correct me if i missunderstand anything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We havent put LSN into each blob index so ATM it is a full resync --- i.e all existing data can be discard. So there is no issue for deleted blob and no necessary for transferring tombstone.

Extending the discussion further , assuming we have LSN in blob index, we can let follower to set its current LSN and leader will only send the [follower_lsn , snapshot_lsn] to follower. In this case, as you said , we care about blob deletion. The trivial approach is leader send out active blob list in <shard_id =S , batch =0 > , follower mark all blobs not in active blob list as deleted.

I think we are not yet have solid thinking regarding the "incremental snapshot" especially with a good amount of reserved log entries. Though personally I am loving it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is a valid scenario. We can do two ways.

  1. Either follower see's a gap in blob-sequence and assume its deletion.
  2. More safe approach is use scrubber. In scrubber leader sends the valid list, its crc across followers. Follower can use this to delete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think before we have the incremental snapshot feature , i.e only transfer diff between 2 snapshots, we would better erase everything on receiver side as anyway we start from scratch

Copy link
Collaborator

@JacksonYao287 JacksonYao287 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we will do in baseline resync write part is incremental snapshot , no?
as I mentioned here
#204 (comment)
after follower receives pg and shard metadata in the first obj, it will ask for shard and blobs that does not exist in this follower.

Either follower see's a gap in blob-sequence and assume its deletion

this does not work, for example, leader has {1, 10} to {3,10} and follower has {1,10} to {2,5}. if {1,5} is deleted at leader , then follower can not get this blob-sequence gap since it will start syncing shards and blobs from shard 2

More safe approach is use scrubber. In scrubber leader sends the valid list, its crc across followers. Follower can use this to delete.

this seems works. also we should sends open shard list, since some seal shard log might also be compacted

@JacksonYao287
Copy link
Collaborator

the code here LGTM.

there are two left questions:
1 if blob is deleted and the deleted blob log is compacted, when baseline resync occurs , how to make follower aware of the deletion of this blob?

2 if shard is sealed and the seal shard log is compacted, when baseline resync occurs , how to make follower aware of the seal of this shard?

they are essentially the same question. let`s think a bit more and discuss it in homestore meeting if necessary.

@sanebay
Copy link
Contributor Author

sanebay commented Sep 9, 2024

Shard seal should work, because either we get latest metadata with shard seal or we see a log entry for shard seal. Only blob's if they are GC-ed, they get orphan on the followers. Scrubber is the right way to solve this.

Add read_snapshot_data to go over all shards and blobs
of a PG. If obj_id is zero, send all shards. obj_id
is concatenation of blob sequence number and batch number.
For all other values of obj_id, we send batch of blobs for a shard.
Once all blobs are finished in a shard, we move to next shard_id,
and batch_num is reset to 0. Add LSN in shard metadata to ignore
all reads of shards which are created later that the snapshot LSN.
Copy link
Contributor

@raakella1 raakella1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG!

@sanebay sanebay merged commit d87b8e4 into eBay:main Sep 9, 2024
25 checks passed
@sanebay sanebay deleted the baseline_resync_read branch September 9, 2024 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Baseline resync: Read side changes
6 participants