[ENH] Add "extratrees" (i.e. randomized trees) for oblique trees #46

adam2392 · 2023-03-06T19:49:08Z

Add random splitter for oblique trees into the codebase to allow "ExtraObliqueTreesClassifier".

This will improve runtime and also act as an analogous improvement to the oblique trees similar to how extratreesclassifier generalizes decisiontreeClassifier in scikit-learn

xref: https://link.springer.com/article/10.1007/s10994-006-6226-1

The text was updated successfully, but these errors were encountered:

adam2392 · 2023-03-06T19:49:34Z

This is complementary to binning #25

adam2392 · 2023-03-06T19:54:06Z

Someone would basically have to look at the oblique splitter code (this implements "best oblique splitter") currently in scikit-tree, and then the random splitter code in scikit-learn and add a corresponding set of Cython code that implements "oblique random splitter"

SUKI-O · 2023-05-14T22:23:39Z

Am I totally off-base in thinking I need to implement an extra version of class BaseObliqueSplitter(Splitter)? Or higher-level class BestObliqueSplitter(ObliqueSplitter): ? We want to pick the partition threshold randomly for all features (weighted by the weights), unlike the class RandomSplitter in schikit-learn picks a feature randomly AND picks threshold within the feature randomly. We want that latter part implemented when picking the split in ObliqueSplitter?

adam2392 · 2023-05-15T01:17:47Z

Am I totally off-base in thinking I need to implement an extra version of class BaseObliqueSplitter(Splitter)? Or higher-level class BestObliqueSplitter(ObliqueSplitter): ? We want to pick the partition threshold randomly for all features (weighted by the weights), unlike the class RandomSplitter in schikit-learn picks a feature randomly AND picks threshold within the feature randomly. We want that latter part implemented when picking the split in ObliqueSplitter?

I would initially guess implement a subclass of BaseObliqueSplitter called RandomObliqueSplitter, which picks projection matrix and threshold randomly.

SUKI-O · 2023-05-15T13:56:08Z

from BaseObliqueSplitter line 264-274

 if current_proxy_improvement > best_proxy_improvement:
                        best_proxy_improvement = current_proxy_improvement
                        # sum of halves is used to avoid infinite value
                        current_split.threshold = feature_values[p - 1] / 2.0 + feature_values[p] / 2.0

                        if (
                            (current_split.threshold == feature_values[p]) or
                            (current_split.threshold == INFINITY) or
                            (current_split.threshold == -INFINITY)
                        ):
                            current_split.threshold = feature_values[p - 1]

can you please explain what this is doing? esp. current_split.threshold = feature_values[p - 1] / 2.0 + feature_values[p] / 2.0 why is this needed? and why current_split.threshold == feature_values[p] is not good?

adam2392 · 2023-05-16T14:21:28Z

The current_split.threshold = feature_values[p - 1] / 2.0 + feature_values[p] / 2.0 is just a fancy way of computing the median value between two points. The feature_values at p might be infinity, so the second if-loop prevents that from occurring.

* Add extra-trees version for oblique trees (i.e. the random oblique splitter) --------- Signed-off-by: ITSUKI OGIHARA <Ogihara_Itsuki@bah.com> Co-authored-by: Adam Li <adam2392@gmail.com>

adam2392 · 2023-09-10T18:13:38Z

Closed by #75

adam2392 added enhancement New feature or request good first issue Good for newcomers Cython Related to Cython code labels Mar 6, 2023

adam2392 changed the title ~~[enh] Add "extratrees" (i.e. randomized trees) for oblique trees~~ [ENH] Add "extratrees" (i.e. randomized trees) for oblique trees Mar 6, 2023

SUKI-O mentioned this issue May 20, 2023

[ENH] Add "extratrees" for oblique trees #46 #75

Merged

5 tasks

adam2392 closed this as completed Sep 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add "extratrees" (i.e. randomized trees) for oblique trees #46

[ENH] Add "extratrees" (i.e. randomized trees) for oblique trees #46

adam2392 commented Mar 6, 2023

adam2392 commented Mar 6, 2023

adam2392 commented Mar 6, 2023

SUKI-O commented May 14, 2023 •

edited

Loading

adam2392 commented May 15, 2023

SUKI-O commented May 15, 2023

adam2392 commented May 16, 2023

adam2392 commented Sep 10, 2023

[ENH] Add "extratrees" (i.e. randomized trees) for oblique trees #46

[ENH] Add "extratrees" (i.e. randomized trees) for oblique trees #46

Comments

adam2392 commented Mar 6, 2023

adam2392 commented Mar 6, 2023

adam2392 commented Mar 6, 2023

SUKI-O commented May 14, 2023 • edited Loading

adam2392 commented May 15, 2023

SUKI-O commented May 15, 2023

adam2392 commented May 16, 2023

adam2392 commented Sep 10, 2023

SUKI-O commented May 14, 2023 •

edited

Loading