Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[io] Extract tkey walk logic from TFile::Map() #17575

Merged
merged 2 commits into from
Feb 5, 2025

Conversation

silverweed
Copy link
Contributor

This Pull request:

refactors TFile::Map into 2 methods: Map and WalkTKeys. The latter contains the logic of traversing the TKeys in the file and returns an array with information about keys, gaps and errors. Map now simply calls that method and prints out the relevant information, in the same format as before.

The main advantage of splitting WalkTKeys is that it can be used by other places (like unit tests or client code) that are interested in the internal TKey structure.

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea! In this approach, all keys are loaded in memory before printing the information. Do we need a piecewise / cursor based API?

io/io/inc/TFile.h Outdated Show resolved Hide resolved
@silverweed
Copy link
Contributor Author

silverweed commented Jan 30, 2025

In this approach, all keys are loaded in memory before printing the information. Do we need a piecewise / cursor based API?

This is possible, although a file containing 1 million keys would only occupy 68 MiB of memory for the TKeyMapNodes (for reference, the 3.8 GB ttjet_13tev benchmark dataset has 278470 keys - about 38 MiB of memory). This is not counting the classname/keyname/key title strings; with them the figure is likely doubled or so.
Generating the array seems to be also quite fast on my machine (<1s in debug mode)

@silverweed silverweed force-pushed the tfilemap_refactor branch 2 times, most recently from 0491d72 to 45adcda Compare January 30, 2025 14:31
@pcanal
Copy link
Member

pcanal commented Jan 30, 2025

The current observed maximum number of baskets in TTree is 50 millions baskets ... and only because it reaches the 1Gb limit for the TTree object. It will/can grow larger once we lift the 1Gb limit and can already reach larger size with RNTuple (probably not quite as easily due to page size being larger than basket sizes).

Nonetheless that is 3.1 GiB of memory for the TKeyMapNodes .... so indeed I would recommend some sort of iterators mechanism (other-wise the code simply 'crash/out-of-memory' for large files.

Copy link

github-actions bot commented Jan 30, 2025

Test Results

    18 files      18 suites   4d 7h 5m 39s ⏱️
 2 687 tests  2 687 ✅ 0 💤 0 ❌
46 670 runs  46 670 ✅ 0 💤 0 ❌

Results for commit 2bcccda.

♻️ This comment has been updated with latest results.

@silverweed
Copy link
Contributor Author

@pcanal is there anything else blocking this PR?

Copy link
Member

@pcanal pcanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@silverweed silverweed merged commit 1d63179 into root-project:master Feb 5, 2025
21 checks passed
@silverweed silverweed deleted the tfilemap_refactor branch February 5, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants