Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce number of trees in RF regressors #1294

Merged
merged 2 commits into from
Oct 1, 2024
Merged

Reduce number of trees in RF regressors #1294

merged 2 commits into from
Oct 1, 2024

Conversation

moralejo
Copy link
Collaborator

Reduced to 150 to 50 trees, physics performance is not impacted, while memory needs will decrease (and speed increase)

Reduced to 150 to 50 trees, physics performance is not impacted, while memory needs will decrease (and speed increase)
Copy link
Member

@morcuended morcuended left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to upload the comparison in the wiki for bookkeeping

lstchain/data/lstchain_standard_config.json Show resolved Hide resolved
Copy link

codecov bot commented Sep 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.50%. Comparing base (e4a5b7a) to head (f154d2e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1294      +/-   ##
==========================================
+ Coverage   73.49%   73.50%   +0.01%     
==========================================
  Files         134      134              
  Lines       14210    14210              
==========================================
+ Hits        10443    10445       +2     
+ Misses       3767     3765       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@moralejo
Copy link
Collaborator Author

Some comparisons here: RF_tree_reduction.pdf

Copy link
Member

@vuillaut vuillaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not changing the classifiers parameters as well?

@moralejo
Copy link
Collaborator Author

moralejo commented Sep 19, 2024

why not changing the classifiers parameters as well?

The tests for that (since they involve background, ideally from real data) are more complicated.
The main goal of this is to reduce the memory footprint of the RF training & application, and the biggest RFs are those for regression. Currently it is 67% for regression, 33% for classification (according to models' .sav files). With this change the total RF size will become ~55% of what it is now, with a share of 60% for classification, 40% for regression. If we aim for a further reduction we should indeed target the classifiers.

@moralejo moralejo requested a review from gabemery October 1, 2024 13:04
@morcuended
Copy link
Member

there is something wrong with mamba setup step in the CI (is taking > 50 min)

@moralejo
Copy link
Collaborator Author

moralejo commented Oct 1, 2024

there is something wrong with mamba setup step in the CI (is taking > 50 min)

Can we ignore it?

@morcuended
Copy link
Member

Tests will not run either until we merge #4563 or the problem with mamba is solved. Anyway, I think the changes here are not tested. I'd say it can be merged

@moralejo moralejo merged commit d1cd930 into main Oct 1, 2024
4 of 7 checks passed
@moralejo moralejo deleted the fewer_trees branch October 1, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants