Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] Fix serialization when a header extension is already present #17597

Merged
merged 4 commits into from
Feb 6, 2025

Conversation

silverweed
Copy link
Contributor

@silverweed silverweed commented Feb 3, 2025

Based on #17596, derived from #17563.
Introduces a new method in RNTupleDescriptor/Builder to allow cloning only the "schema" part of a descriptor, then uses this new method to fix RPagePersistentSink::InitFromDescriptor in the general case where the given descriptor already contains a header extension.
The RNTupleSerializer also needs a couple of changes to cope with this case, because we don't want to move the fields and columns from the header extension to the regular header (which would hinder the Merger's capability of doing incremental merging).

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

Copy link

github-actions bot commented Feb 3, 2025

Test Results

    18 files      18 suites   4d 4h 11m 36s ⏱️
 2 690 tests  2 689 ✅ 0 💤 1 ❌
46 724 runs  46 723 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit 95b11a3.

♻️ This comment has been updated with latest results.

tree/ntuple/v7/src/RNTupleDescriptor.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleDescriptor.cxx Show resolved Hide resolved
tree/ntuple/v7/inc/ROOT/RPageStorage.hxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RPageStorage.cxx Show resolved Hide resolved
tree/ntuple/v7/inc/ROOT/RNTupleDescriptor.hxx Show resolved Hide resolved
tree/ntuple/v7/test/ntuple_test.hxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleDescriptor.cxx Show resolved Hide resolved
@silverweed silverweed force-pushed the ntuple_desc_schema branch 3 times, most recently from c641310 to 95b11a3 Compare February 4, 2025 08:49
@silverweed silverweed requested a review from hahnjo February 4, 2025 08:51
Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side, but I think this requires a review from @jblomer before merging

tree/ntuple/v7/src/RNTupleDescriptor.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleDescriptor.cxx Show resolved Hide resolved
tree/ntuple/v7/src/RPageStorage.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RPageStorage.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleSerialize.cxx Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleSerialize.cxx Show resolved Hide resolved
tree/ntuple/v7/inc/ROOT/RNTupleDescriptor.hxx Outdated Show resolved Hide resolved
silverweed and others added 4 commits February 6, 2025 16:36
CloneSchema is used to create a new descriptor that contains the same
schema information as the given one but no information about clustering
etc.
It currently does the wrong thing by merging the header extension with
the regular header, which results in a descriptor that cannot be
written back to disk (as it contains column ranges with PageZero
locators).
Also, the model it creates must be kept alive in order for it to be
used correctly (e.g. when late model extending)
When serializing the header and header extension we're currently assuming
that the first time we serialize it we don't have a header extension.
While this is true in most cases, it's not true when we are incrementally
merging an existing RNTuple.
With this fix we correctly handle this case by skipping all columns
belonging to the extension header when serializing the regular header.
- verify that we can correctly serialize/deserialize a header that
  already contains a header extension before the first serialization
- verify that we can correctly serialize/deserialize a header that
  contains deferred columns in the non-extension header
@silverweed silverweed merged commit 9ddcf89 into root-project:master Feb 6, 2025
18 of 20 checks passed
@silverweed silverweed deleted the ntuple_desc_schema branch February 6, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants