-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] Make RNTupleChainProcessor
composable
#17393
base: master
Are you sure you want to change the base?
[ntuple] Make RNTupleChainProcessor
composable
#17393
Conversation
Test Results 18 files 18 suites 4d 14h 8m 40s ⏱️ For more details on these failures, see this check. Results for commit b538fcc. ♻️ This comment has been updated with latest results. |
8d818f5
to
75745f8
Compare
75745f8
to
bf4b538
Compare
Prevents the page sources corresponding to the processor to be openened upon creation. Instead, defer opening them until the first `Advance` call.
bf4b538
to
4b69b26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of, mostly minor, comments.
In general lgtm but there are some changes introduced in one commit that are changed by later commits, so it's a bit hard to review commit-wise.
In anticipation of the composition of the different processors (in particular, the join processor), entries loading should offer the possibility for random access. This will not change anything about the way users can interact with the processor, which is exclusively through the (linear) iterator.
With this change, the `RNTupleChainProcessor` iterates over other `RNTupleProcessor` objects instead of individual ntuples. This allows us to chain, for example, ntuples that have previously been joined using the `RNTupleJoinProcessor`.
4b69b26
to
b538fcc
Compare
lgtm but better wait for other people's approval as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great! I have some questions on details.
if (fCurrentEntryNumber != kInvalidNTupleIndex) { | ||
fProcessor.SetLocalEntryNumber(fCurrentEntryNumber); | ||
fCurrentEntryNumber = fProcessor.Advance(); | ||
if (processor.GetCurrentEntryNumber() != kInvalidNTupleIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this still check fCurrentEntryNumber
?
NTupleSize_t localEntryNumber = entryNumber; | ||
size_t currentNTuple = fCurrentNTupleNumber; | ||
|
||
if (fLocalEntryNumber >= fPageSource->GetNEntries()) { | ||
do { | ||
if (++fCurrentNTupleNumber >= fNTuples.size()) { | ||
return kInvalidNTupleIndex; | ||
} | ||
// Skip over empty ntuples we might encounter. | ||
} while (ConnectNTuple(fNTuples.at(fCurrentNTupleNumber)) == 0); | ||
while (localEntryNumber >= fInnerNEntries[currentNTuple]) { | ||
localEntryNumber -= fInnerNEntries[currentNTuple]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that entryNumber
is the index in the context of the chain, and fInnerNEntries[currentNTuple]
is the number of entries in a particular chain link, it seems to me that the arithmetic is wrong. Shouldn't we sum up all the previous fInnerNEntries[i]
up to but not including currentNTuple
and subtract from entryNumber
to get localEntryNumber
(entry number within chain link). (Taking into account that the result may be negative if we need to jump back.)
// know there is nothing to advance to. | ||
if (processor.GetCurrentEntryNumber() != kInvalidNTupleIndex) { | ||
// know there is nothing to load. | ||
if (fCurrentEntryNumber != kInvalidNTupleIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, here it is. Perhaps move to the previous commit.
virtual void SetEntryPointers(const REntry &entry) = 0; | ||
|
||
///////////////////////////////////////////////////////////////////////////// | ||
/// \brief Get the total number of entries in this processor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add a comment that this is costly for a chain processor (all underlying files need to be opened). I guess this expensive operation is only necessary if a chain is built from other chains..?
for (const auto &value : *fEntry) { | ||
auto &field = value.GetField(); | ||
auto valuePtr = entry.GetPtr<void>(field.GetQualifiedFieldName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR: but we should implement/find a way to iterate over all the pointers of an entry without all the field lookups.
while (localEntryNumber >= fInnerNEntries[currentNTuple]) { | ||
localEntryNumber -= fInnerNEntries[currentNTuple]; | ||
// Determine to which inner processer and local entry number the provided global entry number belongs. | ||
while (localEntryNumber >= fInnerNEntries[currProcessor]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe assert that fInnerNEntries[0] != kInvalidNTupleIndex
This PR introduces the possibility to create
RNTupleChainProcessor
s from other processor objects. In turn, this makes it possible to, for example, create a chain of joined ntuples.This PR is part of a bigger set of (foreseen) changes, collected and tracked in #17132.