Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] Make RNTupleChainProcessor composable #17393

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

enirolf
Copy link
Contributor

@enirolf enirolf commented Jan 9, 2025

This PR introduces the possibility to create RNTupleChainProcessors from other processor objects. In turn, this makes it possible to, for example, create a chain of joined ntuples.

This PR is part of a bigger set of (foreseen) changes, collected and tracked in #17132.

@enirolf enirolf self-assigned this Jan 9, 2025
Copy link

github-actions bot commented Jan 9, 2025

Test Results

    18 files      18 suites   4d 14h 8m 40s ⏱️
 2 683 tests  2 682 ✅ 0 💤 1 ❌
46 598 runs  46 597 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit b538fcc.

♻️ This comment has been updated with latest results.

@enirolf enirolf force-pushed the ntuple-processor-chain-composition branch 2 times, most recently from 8d818f5 to 75745f8 Compare January 14, 2025 16:14
@enirolf enirolf force-pushed the ntuple-processor-chain-composition branch from 75745f8 to bf4b538 Compare January 22, 2025 08:51
Prevents the page sources corresponding to the processor to be openened
upon creation. Instead, defer opening them until the first `Advance`
call.
@enirolf enirolf force-pushed the ntuple-processor-chain-composition branch from bf4b538 to 4b69b26 Compare January 22, 2025 10:30
@enirolf enirolf marked this pull request as ready for review January 22, 2025 10:33
@enirolf enirolf requested review from jblomer and couet as code owners January 22, 2025 10:33
Copy link
Contributor

@silverweed silverweed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of, mostly minor, comments.
In general lgtm but there are some changes introduced in one commit that are changed by later commits, so it's a bit hard to review commit-wise.

tree/ntuple/v7/src/RNTupleProcessor.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleProcessor.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/inc/ROOT/RNTupleProcessor.hxx Outdated Show resolved Hide resolved
tree/ntuple/v7/inc/ROOT/RNTupleProcessor.hxx Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleProcessor.cxx Outdated Show resolved Hide resolved
In anticipation of the composition of the different processors (in
particular, the join processor), entries loading should offer the
possibility for random access. This will not change anything about the
way users can interact with the processor, which is exclusively through
the (linear) iterator.
With this change, the `RNTupleChainProcessor` iterates over other
`RNTupleProcessor` objects instead of individual ntuples. This allows us
to chain, for example, ntuples that have previously been joined using
the `RNTupleJoinProcessor`.
@enirolf enirolf force-pushed the ntuple-processor-chain-composition branch from 4b69b26 to b538fcc Compare January 28, 2025 16:05
@enirolf enirolf requested a review from silverweed January 28, 2025 16:05
@silverweed
Copy link
Contributor

lgtm but better wait for other people's approval as well

Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great! I have some questions on details.

if (fCurrentEntryNumber != kInvalidNTupleIndex) {
fProcessor.SetLocalEntryNumber(fCurrentEntryNumber);
fCurrentEntryNumber = fProcessor.Advance();
if (processor.GetCurrentEntryNumber() != kInvalidNTupleIndex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this still check fCurrentEntryNumber?

Comment on lines 247 to 251
NTupleSize_t localEntryNumber = entryNumber;
size_t currentNTuple = fCurrentNTupleNumber;

if (fLocalEntryNumber >= fPageSource->GetNEntries()) {
do {
if (++fCurrentNTupleNumber >= fNTuples.size()) {
return kInvalidNTupleIndex;
}
// Skip over empty ntuples we might encounter.
} while (ConnectNTuple(fNTuples.at(fCurrentNTupleNumber)) == 0);
while (localEntryNumber >= fInnerNEntries[currentNTuple]) {
localEntryNumber -= fInnerNEntries[currentNTuple];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that entryNumber is the index in the context of the chain, and fInnerNEntries[currentNTuple] is the number of entries in a particular chain link, it seems to me that the arithmetic is wrong. Shouldn't we sum up all the previous fInnerNEntries[i] up to but not including currentNTuple and subtract from entryNumber to get localEntryNumber (entry number within chain link). (Taking into account that the result may be negative if we need to jump back.)

// know there is nothing to advance to.
if (processor.GetCurrentEntryNumber() != kInvalidNTupleIndex) {
// know there is nothing to load.
if (fCurrentEntryNumber != kInvalidNTupleIndex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, here it is. Perhaps move to the previous commit.

virtual void SetEntryPointers(const REntry &entry) = 0;

/////////////////////////////////////////////////////////////////////////////
/// \brief Get the total number of entries in this processor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a comment that this is costly for a chain processor (all underlying files need to be opened). I guess this expensive operation is only necessary if a chain is built from other chains..?

Comment on lines +193 to +195
for (const auto &value : *fEntry) {
auto &field = value.GetField();
auto valuePtr = entry.GetPtr<void>(field.GetQualifiedFieldName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR: but we should implement/find a way to iterate over all the pointers of an entry without all the field lookups.

while (localEntryNumber >= fInnerNEntries[currentNTuple]) {
localEntryNumber -= fInnerNEntries[currentNTuple];
// Determine to which inner processer and local entry number the provided global entry number belongs.
while (localEntryNumber >= fInnerNEntries[currProcessor]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe assert that fInnerNEntries[0] != kInvalidNTupleIndex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants