Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RDF] Add option to change default basket size in RDataFrame Snapshot #17579

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Aditya-138-12
Copy link

This Pull request:

Introduces a new option in RSnapshotOptions.hxx to allow users to configure the default basket size of new branches when using RDataFrame::Snapshot. Modify SetBranchesHelper in ActionHelpers.hxx to honor this option.

Changes or fixes:

  1. Add a fBasketSize field to RSnapshotOptions to store the user-defined basket size.
  2. Update SetBranchesHelper to apply the specified basket size when creating output.
  3. Ensure backward compatibility by maintaining the current default behavior if no custom size is set.
    :)

Checklist:

  • [✅] tested changes locally
  • [❌] updated the docs (if necessary)

This PR fixes #17418

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @Aditya-138-12 for this contribution! I feel like we're already going in the right direction. I have a few introductory inline comments. Together with those, I would highly apppreciate if you could add a test for the new feature, checking that the output TTree after a Snapshot has the requested basket size. You can take inspiration from some of the tests already written in e.g. dataframe_snapshot.cxx.

tree/dataframe/inc/ROOT/RSnapshotOptions.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RSnapshotOptions.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx Outdated Show resolved Hide resolved
…f int, 2. Passing only basketSize in the SetBranchesHelper function instead of Complete options object, 3. Added some more inline comments.
tree/dataframe/inc/ROOT/RSnapshotOptions.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RSnapshotOptions.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx Outdated Show resolved Hide resolved
Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt interaction! We are still missing:

  • Adapting the other overload of SetBranchesHelper
  • Some tests

@Aditya-138-12
Copy link
Author

Thanks for the prompt interaction! We are still missing:

  • Adapting the other overload of SetBranchesHelper
  • Some tests

Thank you so much for your guidance! I really appreciate your time and feedback. I’m currently working on the tests, but I wanted to make sure I fully understand your point about 'adapting the other overload of SetBranchesHelper.' Could you kindly clarify what exactly needs to be adapted? I’d be grateful for any additional details!

@vepadulano
Copy link
Member

Could you kindly clarify what exactly needs to be adapted?

There is another function template of SetBranchesHelper starting at line 1336 of ActionHelpers.hxx. Currently in this PR you are only modifying the overload responsible for writing out branches storing collections instead, which is the one starting at 1399.

… added a < in a inline comment in RsnapshotOptions.hxx, made some changes in the SetBranchesHelper function template.
@dpiparo
Copy link
Member

dpiparo commented Feb 1, 2025

Let's test this!

Copy link

github-actions bot commented Feb 1, 2025

Test Results

     6 files       6 suites   1d 1h 7m 32s ⏱️
 2 685 tests  2 676 ✅ 0 💤  9 ❌
15 251 runs  15 224 ✅ 0 💤 27 ❌

For more details on these failures, see this check.

Results for commit bac9082.

@@ -1335,7 +1336,7 @@ void *GetData(T & /*v*/)
template <typename T>
void SetBranchesHelper(TTree *inputTree, TTree &outputTree, const std::string &inName, const std::string &name,
TBranch *&branch, void *&branchAddress, T *address, RBranchSet &outputBranches,
bool /*isDefine*/)
bool /*isDefine*/, const std::optional<int> &basketSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have just added the function parameter, but not using it anywhere in the function. Also here, you should make use of this parameter to ensure that new branches are created with the correct basket size

{
// Create a simple ROOT file with a TTree
auto input_file = TFile::Open("input_file.root", "RECREATE");
auto tree = new TTree("tree", "Test Tree");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a new keyword without the corresponding delete. This is wrong and in general new should be avoided at all costs, prefer stack-based variables or constructs that automatically manage memory such as std::unique_ptr

Comment on lines +1462 to +1463
auto input_file = TFile::Open("input_file.root", "RECREATE");
auto tree = new TTree("tree", "Test Tree");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not create the dataset like this. Prefer RAII struct such as the ones you can find in many RDF tests. An example is

struct DataRAII {

Comment on lines +1469 to +1475
for (int i = 0; i < 1000; ++i) {
value = i;
tree->Fill();
}

input_file->Write();
input_file->Close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should end up in the RAII constructor

Comment on lines +1515 to +1518
// Clean up
gSystem->Unlink("input_file.root");
gSystem->Unlink(output_file_custom.c_str());
gSystem->Unlink(output_file_default.c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should end up in the RAII destructor

Comment on lines +1484 to +1489
ROOT::RDF::RSnapshotOptions options;
options.fBasketSize = 2048; // Set custom basket size (2KB)

// Snapshot with custom basket size
std::string output_file_custom = "output_file_custom_basket.root";
df_with_new_column.Snapshot("tree", output_file_custom, {"branch_x", "branch_x_new"}, options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will for sure not work as you expect. All the branches involved in this computation graph are of type float. You are not modifying anything in ActionHelpers.hxx to act upon branches of this type. This is the other function template I was referring to. Currently you are just passing the basketSize as parameter, you need to actually make use of it in the function.

Comment on lines +1495 to +1496
for (auto branch : *output_tree_custom->GetListOfBranches()) {
EXPECT_EQ(branch->GetBasketSize(), 2048); // Verify the custom basket size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetListOfBranches is not returning a collection of TBranch objects, you need to cast it. See various places using TRangeDynCast for the right thing to do here. For example

for (auto b : TRangeDynCast<TBranch>(*fSourceTree->GetListOfBranches())) {

Comment on lines +1458 to +1459
// Test for custom basket size in Snapshot, ROOT - #17418
TEST(RDFSnapshotMore, CustomBasketSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is only testing one branch of type float, i.e. a scalar. We need at least another test for collection types

…rting at line 1336 for using the basketSize Parameter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to change default basket size in RDataFrame Snapshot
4 participants