-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RDF] Add option to change default basket size in RDataFrame Snapshot #17579
base: master
Are you sure you want to change the base?
Conversation
…e ActionHelpers.hxx file for custom basket size knob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @Aditya-138-12 for this contribution! I feel like we're already going in the right direction. I have a few introductory inline comments. Together with those, I would highly apppreciate if you could add a test for the new feature, checking that the output TTree after a Snapshot has the requested basket size. You can take inspiration from some of the tests already written in e.g. dataframe_snapshot.cxx
.
…f int, 2. Passing only basketSize in the SetBranchesHelper function instead of Complete options object, 3. Added some more inline comments.
…d::optional<int> fBasketSize{}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt interaction! We are still missing:
- Adapting the other overload of
SetBranchesHelper
- Some tests
Thank you so much for your guidance! I really appreciate your time and feedback. I’m currently working on the tests, but I wanted to make sure I fully understand your point about 'adapting the other overload of SetBranchesHelper.' Could you kindly clarify what exactly needs to be adapted? I’d be grateful for any additional details! |
There is another function template of |
… added a < in a inline comment in RsnapshotOptions.hxx, made some changes in the SetBranchesHelper function template.
Let's test this! |
Test Results 6 files 6 suites 1d 1h 7m 32s ⏱️ For more details on these failures, see this check. Results for commit bac9082. |
@@ -1335,7 +1336,7 @@ void *GetData(T & /*v*/) | |||
template <typename T> | |||
void SetBranchesHelper(TTree *inputTree, TTree &outputTree, const std::string &inName, const std::string &name, | |||
TBranch *&branch, void *&branchAddress, T *address, RBranchSet &outputBranches, | |||
bool /*isDefine*/) | |||
bool /*isDefine*/, const std::optional<int> &basketSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have just added the function parameter, but not using it anywhere in the function. Also here, you should make use of this parameter to ensure that new branches are created with the correct basket size
{ | ||
// Create a simple ROOT file with a TTree | ||
auto input_file = TFile::Open("input_file.root", "RECREATE"); | ||
auto tree = new TTree("tree", "Test Tree"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a new
keyword without the corresponding delete
. This is wrong and in general new
should be avoided at all costs, prefer stack-based variables or constructs that automatically manage memory such as std::unique_ptr
auto input_file = TFile::Open("input_file.root", "RECREATE"); | ||
auto tree = new TTree("tree", "Test Tree"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not create the dataset like this. Prefer RAII struct such as the ones you can find in many RDF tests. An example is
root/tree/dataframe/test/dataframe_vary.cxx
Line 1626 in 8b3840f
struct DataRAII { |
for (int i = 0; i < 1000; ++i) { | ||
value = i; | ||
tree->Fill(); | ||
} | ||
|
||
input_file->Write(); | ||
input_file->Close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should end up in the RAII constructor
// Clean up | ||
gSystem->Unlink("input_file.root"); | ||
gSystem->Unlink(output_file_custom.c_str()); | ||
gSystem->Unlink(output_file_default.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should end up in the RAII destructor
ROOT::RDF::RSnapshotOptions options; | ||
options.fBasketSize = 2048; // Set custom basket size (2KB) | ||
|
||
// Snapshot with custom basket size | ||
std::string output_file_custom = "output_file_custom_basket.root"; | ||
df_with_new_column.Snapshot("tree", output_file_custom, {"branch_x", "branch_x_new"}, options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will for sure not work as you expect. All the branches involved in this computation graph are of type float
. You are not modifying anything in ActionHelpers.hxx
to act upon branches of this type. This is the other function template I was referring to. Currently you are just passing the basketSize
as parameter, you need to actually make use of it in the function.
for (auto branch : *output_tree_custom->GetListOfBranches()) { | ||
EXPECT_EQ(branch->GetBasketSize(), 2048); // Verify the custom basket size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetListOfBranches
is not returning a collection of TBranch
objects, you need to cast it. See various places using TRangeDynCast
for the right thing to do here. For example
for (auto b : TRangeDynCast<TBranch>(*fSourceTree->GetListOfBranches())) { |
// Test for custom basket size in Snapshot, ROOT - #17418 | ||
TEST(RDFSnapshotMore, CustomBasketSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is only testing one branch of type float
, i.e. a scalar. We need at least another test for collection types
…rting at line 1336 for using the basketSize Parameter.
This Pull request:
Introduces a new option in RSnapshotOptions.hxx to allow users to configure the default basket size of new branches when using RDataFrame::Snapshot. Modify SetBranchesHelper in ActionHelpers.hxx to honor this option.
Changes or fixes:
:)
Checklist:
This PR fixes #17418