-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] Fix serialization when a header extension is already present #17597
Conversation
b8908f5
to
d0ecd0a
Compare
d0ecd0a
to
3cc778c
Compare
Test Results 18 files 18 suites 4d 4h 11m 36s ⏱️ For more details on these failures, see this check. Results for commit 95b11a3. ♻️ This comment has been updated with latest results. |
c641310
to
95b11a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side, but I think this requires a review from @jblomer before merging
95b11a3
to
6a6b791
Compare
CloneSchema is used to create a new descriptor that contains the same schema information as the given one but no information about clustering etc.
It currently does the wrong thing by merging the header extension with the regular header, which results in a descriptor that cannot be written back to disk (as it contains column ranges with PageZero locators). Also, the model it creates must be kept alive in order for it to be used correctly (e.g. when late model extending)
When serializing the header and header extension we're currently assuming that the first time we serialize it we don't have a header extension. While this is true in most cases, it's not true when we are incrementally merging an existing RNTuple. With this fix we correctly handle this case by skipping all columns belonging to the extension header when serializing the regular header.
- verify that we can correctly serialize/deserialize a header that already contains a header extension before the first serialization - verify that we can correctly serialize/deserialize a header that contains deferred columns in the non-extension header
6a6b791
to
03e6aed
Compare
Based on #17596, derived from #17563.
Introduces a new method in RNTupleDescriptor/Builder to allow cloning only the "schema" part of a descriptor, then uses this new method to fix
RPagePersistentSink::InitFromDescriptor
in the general case where the given descriptor already contains a header extension.The RNTupleSerializer also needs a couple of changes to cope with this case, because we don't want to move the fields and columns from the header extension to the regular header (which would hinder the Merger's capability of doing incremental merging).
Checklist: