-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add support for THnSparseD histogram in RDataFrame #19975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thanks a lot!
|
Thanks @ferdymercury , I will try to follow your instructions. I noticed in my local tests that the THnSparseD and THnD too do not work with multithreading. Both seems to work fine with a single thread. |
See https://root-forum.cern.ch/t/filling-histograms-in-parallel/35460 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Feel free to click on the button "Ready to Review" so that core developers are notified.
I see, thank you for advice. |
I implemented all required changes. I have few comments/questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning the link to the html file, leave it empty for the moment.
Thanks.
One further comment: commit messages usually start with the involved component: [math] fix sth |
765b955
to
689ef1d
Compare
Ok, I added [RDF] or [NFC] to commits |
Thanks again! In
nitpick wrt commit message: [hist] is usually small compared to [RDF] RDataFrame |
689ef1d
to
8b05de0
Compare
Hi @jackapet, thank you very much for this contribution! It looks good to me at the first glance, so I will let the CI run and once we see those results I will also give you a more detailed review. Thanks to @ferdymercury for giving the first reviews and hints as well! |
Test Results 21 files 21 suites 3d 22h 58m 50s ⏱️ For more details on these failures, see this check. Results for commit d8a9b0b. ♻️ This comment has been updated with latest results. |
Hi @jackapet, so unfortunately there is an issue with your changes at the ROOT building stage. Have you tried building ROOT locally with all these changes? If you go through the logs of the CI workers, for example, you can see the following:
This will need to be fixed first. |
Hello @martamaja10 , I am able to compile code without tests. I was not able to run or compile tests because I am not able to install required system packages. I am using windows with wsl2. ROOT compilation fails with RVec compilation:
I guess wsl2 does not have this variable defined. Is there some workaround for this? When running on cluster, I am able to compile the ROOT but I can't install libraries. So, I was able to check rdataframe in my local script but not in required tests. Could you provide instructions how to compile tests? Do you have a singularity/docker image for developers with all required libraries installed? I can access cvmfs if it helps. Otherwise, I will try to build my own image. |
when calling cmake, did you enable flag -Dtesting=ON -Droottest=ON ? (Building is enough, no need to install) |
The failure is likely because there is something equivalent to Line 189 in 50551c3
that needs to be implemented in THnSparse.h and .cxx |
Ok, thank you. I am trying to recompile it. I made an apptainer image with settings below. I hope all libraries are now installed.
|
I see, good point. There is an inconsistency between THnD and THnSparseD constructors. I suggest make it consistent in this PR. I can add the missing constructor to THnSparseD and I can also add a constructor using std::vector into THnD which is already in THnSparseD. |
|
Thanks, this will help. I managed to fix missing constructor and I can compile it with test. However, I will still need to install few missing packages to enable running tests in python. |
Thank you! Let's see what CI says now :) |
Hi @martamaja10 , @ferdymercury , I was able to prepare apptainer image containing the same packages as described in root-ci-images (#19975 (comment)). However, I am still not able to run test locally because I do not know how to execute them. Should I run some setup script for test? Should I execute them using |
From the build directory, you can run ctest -R root-io-filemerger (specify the name or subset of test names you want to run e.g. root-io-*) |
I run all tests beginning on I also added one additional test into THn.cxx. This test checks all constructors. This test passes too. About the failed test in the check_reducer_merger. This is related with multithreading. THnSparse cannot run in MT mode. The test fails with:
Should we remove this test? |
@ferdymercury , @martamaja10 , I was checking THnSparse implementation. There is lot of manipulation with memory using new, malloc, memcpy, delete, and other operators. This is because the implementation is based on basic arrays like Int_t*. I would suggest to rewrite it using std::unordered_map and std::vector classes. With this we can avoid any direct manipulation with memory. We can also implement it without using chunks, which are causing troubles, and leave optimization on std::unordered_map. Error messages indicate that some internal object is deleted twice - in test dataframe cloning. It seems that some object was shared after cloning and two deletions happened when original and cloned objects were deleted. |
TExMap fBinsContinued; ///<! Filled bins for non-unique hashes, containing pairs of (bin index 0, bin index 1) | ||
THnSparseCompactBinCoord *fCompactCoord; ///<! Compact coordinate | ||
|
||
THnSparse(const THnSparse&) = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, what is the reason for undeleting this constructor?
Could it be that this leads to the failures seen in the CI ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor is undeleted because it is required by RResultPtr
. In other words, object without this constructor cannot be used by RDataFrame.
Perhaps, this constructor require explicit implementation.
I guess that since it's old code, the trend is to not touch it too much and apply minimal fixes to solve encountered issues/bugs. (Unless there is a significant performance improvement by switching to unordered_map). Question: does the crash happen also before the additions of this PR? So to say, if you run |
The crash is in new tests added in this request. All failures are related with THnSparse usage in RDataFrame. There are two things which does not work: running with multithreading and lazy cloning via ROOT::Internal::RDF::CloneResultAndAction (leads to double deletion). Multithreading leads to failure in I will try to run with debugger to identify where exactly the code crash. Perhaps it will give some hints how to fix it. |
I think I understand the issue now. THnSparse has a raw pointer as a member variable. When we copy or clone the object we just copy the pointer. The pointer then points to the same location in the memory. This leads to crash when deleting original and cloned object. Bin contents can be also wrong. Issues in multithreading can be also caused by that, maybe. So, we need an explicit implementation of the copy constructor (The one which is undeleted). |
Good catch! |
Explicit implementation fixed a problem with cloning and varying THnSparse histograms. The problem with multithreading is still there. The implementation is very thread unsafe. There is a dynamic allocation of the memory and two threads may attempt to allocate memory at the same time. I suggest to remove this merger test. After this we can check pipeline again. |
Thanks for the fix! |
'That' in itself is not usually an issue ( |
This Pull request:
Add support for THnSparseD histograms in RDF
Changes or fixes:
THnSparse(const THnSparse&) = delete;
Checklist:
Fixes #19969