Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add "extratrees" (i.e. randomized trees) for oblique trees #46

Closed
adam2392 opened this issue Mar 6, 2023 · 7 comments
Closed
Labels
Cython Related to Cython code enhancement New feature or request good first issue Good for newcomers

Comments

@adam2392
Copy link
Collaborator

adam2392 commented Mar 6, 2023

Add random splitter for oblique trees into the codebase to allow "ExtraObliqueTreesClassifier".

This will improve runtime and also act as an analogous improvement to the oblique trees similar to how extratreesclassifier generalizes decisiontreeClassifier in scikit-learn

xref: https://link.springer.com/article/10.1007/s10994-006-6226-1

@adam2392
Copy link
Collaborator Author

adam2392 commented Mar 6, 2023

This is complementary to binning #25

@adam2392 adam2392 added enhancement New feature or request good first issue Good for newcomers Cython Related to Cython code labels Mar 6, 2023
@adam2392
Copy link
Collaborator Author

adam2392 commented Mar 6, 2023

Someone would basically have to look at the oblique splitter code (this implements "best oblique splitter") currently in scikit-tree, and then the random splitter code in scikit-learn and add a corresponding set of Cython code that implements "oblique random splitter"

@adam2392 adam2392 changed the title [enh] Add "extratrees" (i.e. randomized trees) for oblique trees [ENH] Add "extratrees" (i.e. randomized trees) for oblique trees Mar 6, 2023
@SUKI-O
Copy link
Member

SUKI-O commented May 14, 2023

Am I totally off-base in thinking I need to implement an extra version of class BaseObliqueSplitter(Splitter)? Or higher-level class BestObliqueSplitter(ObliqueSplitter): ? We want to pick the partition threshold randomly for all features (weighted by the weights), unlike the class RandomSplitter in schikit-learn picks a feature randomly AND picks threshold within the feature randomly. We want that latter part implemented when picking the split in ObliqueSplitter?

@adam2392
Copy link
Collaborator Author

Am I totally off-base in thinking I need to implement an extra version of class BaseObliqueSplitter(Splitter)? Or higher-level class BestObliqueSplitter(ObliqueSplitter): ? We want to pick the partition threshold randomly for all features (weighted by the weights), unlike the class RandomSplitter in schikit-learn picks a feature randomly AND picks threshold within the feature randomly. We want that latter part implemented when picking the split in ObliqueSplitter?

I would initially guess implement a subclass of BaseObliqueSplitter called RandomObliqueSplitter, which picks projection matrix and threshold randomly.

@SUKI-O
Copy link
Member

SUKI-O commented May 15, 2023

from BaseObliqueSplitter line 264-274

 if current_proxy_improvement > best_proxy_improvement:
                        best_proxy_improvement = current_proxy_improvement
                        # sum of halves is used to avoid infinite value
                        current_split.threshold = feature_values[p - 1] / 2.0 + feature_values[p] / 2.0

                        if (
                            (current_split.threshold == feature_values[p]) or
                            (current_split.threshold == INFINITY) or
                            (current_split.threshold == -INFINITY)
                        ):
                            current_split.threshold = feature_values[p - 1]

can you please explain what this is doing? esp. current_split.threshold = feature_values[p - 1] / 2.0 + feature_values[p] / 2.0 why is this needed? and why current_split.threshold == feature_values[p] is not good?

@adam2392
Copy link
Collaborator Author

The current_split.threshold = feature_values[p - 1] / 2.0 + feature_values[p] / 2.0 is just a fancy way of computing the median value between two points. The feature_values at p might be infinity, so the second if-loop prevents that from occurring.

adam2392 added a commit that referenced this issue Sep 10, 2023
* Add extra-trees version for oblique trees (i.e. the random oblique splitter)

---------

Signed-off-by: ITSUKI OGIHARA <Ogihara_Itsuki@bah.com>
Co-authored-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Collaborator Author

Closed by #75

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython Related to Cython code enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants