Skip to content

Commit

Permalink
Merge branch 'main' into cmicont
Browse files Browse the repository at this point in the history
  • Loading branch information
adam2392 committed Jul 5, 2023
2 parents 58a2717 + d03603f commit a780b04
Show file tree
Hide file tree
Showing 11 changed files with 145 additions and 579 deletions.
11 changes: 5 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,18 @@ scikit-tree

scikit-tree is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.

Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

**Dependency on a fork of scikit-learn**
**Submodule dependency on a fork of scikit-learn**
Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a maintained fork of scikit-learn at https://github.com/neurodata/scikit-learn, where specifically, the `fork` branch is used to build and install this repo. We keep that fork well-maintained and up-to-date with respect to the main sklearn repo. The only difference is the refactoring of the `tree/` submodule. This fork is used internally under the namespace ``sktree._lib.sklearn``. It is necessary to use this fork for anything related to:

- `RandomForest*`
- `ExtraTrees*`
- or any importable items from the `tree/` submodule, whether it is a Cython or Python object

If you are developing for scikit-tree, we will always depend on the most up-to-date commit of `https://github.com/neurodata/scikit-learn/submodule` as a submodule within scikit-tee. This branch is consistently maintained for changes upstream that occur in the scikit-learn tree submodule. This ensures that our fork maintains consistency and robustness due to bug fixes and improvements upstream.
If you are developing for scikit-tree, we will always depend on the most up-to-date commit of `https://github.com/neurodata/scikit-learn/submodulev2` as a submodule within scikit-tee. This branch is consistently maintained for changes upstream that occur in the scikit-learn tree submodule. This ensures that our fork maintains consistency and robustness due to bug fixes and improvements upstream.

Documentation
=============
Expand All @@ -43,7 +45,7 @@ We minimally require:
* Python (>=3.8)
* numpy
* scipy
* scikit-learn
* scikit-learn >= 1.3

Building locally with Meson (RECOMMENDED)
-----------------------------------------
Expand All @@ -55,9 +57,6 @@ Make sure you have the necessary packages installed
# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

# (caution only if you know what you're doing) or if you're a developer and need some latest changes on the fork:
pip install scikit-learn-tree@git+https://git@github.com/neurodata/scikit-learn.git@fork

We use the ``spin`` CLI to abstract away build details:

# run the build using Meson/Ninja
Expand Down
2 changes: 1 addition & 1 deletion build_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ meson-python
cython
ninja
numpy
scikit-learn@git+https://git@github.com/scikit-learn/scikit-learn
scikit-learn>=1.3
click
rich-click
doit
Expand Down
3 changes: 2 additions & 1 deletion doc_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ ipython
nbsphinx
memory_profiler
pandas
seaborn
seaborn
joblib
4 changes: 3 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@
"pipeline.Pipeline": "sklearn.pipeline.Pipeline",
# "sklearn_fork.inspection.permutation_importance": "sklearn.inspection.permutation_importance",
}

numpydoc_xref_ignore = {
"of",
"or",
Expand Down Expand Up @@ -218,6 +219,7 @@
"n_features_z",
"n_neighbors",
"one",
"joblib.parallel_backend",
}

# validation
Expand Down Expand Up @@ -254,7 +256,7 @@

# -- intersphinx -------------------------------------------------------------
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"python": ("https://docs.python.org/{.major}".format(sys.version_info), None),
"numpy": ("https://numpy.org/devdocs", None),
"scipy": ("https://scipy.github.io/devdocs", None),
"sklearn": ("https://scikit-learn.org/dev", None),
Expand Down
681 changes: 122 additions & 559 deletions poetry.lock

Large diffs are not rendered by default.

5 changes: 3 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ classifiers = [
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'Programming Language :: Python :: 3.11',
]
keywords = ['tree', 'oblique trees', 'manifold-learning', 'scikit-learn']
include = [
Expand Down Expand Up @@ -68,11 +69,11 @@ vcs = "git"
files = ["sktree/__init__.py"]

[tool.poetry.dependencies]
python = ">=3.8,<3.11"
python = ">=3.8,<3.12"
numpy = "^1.23.0"
scipy = "^1.9.0"
scikit-learn = "^1.2.2"
importlib-resources = { version = "*", python = "<3.9" }
importlib-resources = { version = "*", python = "<3.10" }

[tool.poetry.group.test]
optional = true
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
numpy
scipy
scikit-learn@git+https://git@github.com/scikit-learn/scikit-learn
scikit-learn>=1.3
2 changes: 1 addition & 1 deletion sktree/_lib/sklearn_fork
Submodule sklearn_fork updated 827 files
2 changes: 1 addition & 1 deletion sktree/ensemble/_honest_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ class HonestForestClassifier(ForestClassifier):
n_jobs : int, default=None
The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
:meth:`decision_path` and :meth:`apply` are all parallelized over the
trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
trees. ``None`` means 1 unless in a `joblib.parallel_backend`
context. ``-1`` means using all processors. See :term:`Glossary
<n_jobs>` for more details.
Expand Down
8 changes: 4 additions & 4 deletions sktree/ensemble/_supervised_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ class ObliqueRandomForestClassifier(SimMatrixMixin, ForestClassifier):
n_jobs : int, default=None
The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
:meth:`decision_path` and :meth:`apply` are all parallelized over the
trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
trees. ``None`` means 1 unless in a `joblib.parallel_backend`
context. ``-1`` means using all processors. See :term:`Glossary
<n_jobs>` for more details.
Expand Down Expand Up @@ -454,7 +454,7 @@ class ObliqueRandomForestRegressor(SimMatrixMixin, ForestRegressor):
n_jobs : int, default=None
The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
:meth:`decision_path` and :meth:`apply` are all parallelized over the
trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
trees. ``None`` means 1 unless in a `joblib.parallel_backend`
context. ``-1`` means using all processors. See :term:`Glossary
<n_jobs>` for more details.
Expand Down Expand Up @@ -743,7 +743,7 @@ class PatchObliqueRandomForestClassifier(SimMatrixMixin, ForestClassifier):
n_jobs : int, default=None
The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
:meth:`decision_path` and :meth:`apply` are all parallelized over the
trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
trees. ``None`` means 1 unless in a `joblib.parallel_backend`
context. ``-1`` means using all processors. See :term:`Glossary
<n_jobs>` for more details.
Expand Down Expand Up @@ -1095,7 +1095,7 @@ class PatchObliqueRandomForestRegressor(SimMatrixMixin, ForestRegressor):
n_jobs : int, default=None
The number of jobs to run in parallel. :meth:`fit`, :meth:`predict`,
:meth:`decision_path` and :meth:`apply` are all parallelized over the
trees. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`
trees. ``None`` means 1 unless in a `joblib.parallel_backend`
context. ``-1`` means using all processors. See :term:`Glossary
<n_jobs>` for more details.
Expand Down
4 changes: 2 additions & 2 deletions sktree/tree/_marginalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ def _apply_marginal_tree(


def compute_marginal(self: BaseForest, X, S, n_repeats=10):
"""Compute marginal distribution of P(S = s) for each s in X.
r"""Compute marginal distribution of P(S = s) for each s in X.
Parameters
----------
Expand Down Expand Up @@ -194,7 +194,7 @@ def compute_marginal(self: BaseForest, X, S, n_repeats=10):


def compute_conditional(self, X, S, y=None, n_repeats=10):
"""Compute conditional P(Y | X, Z = z) for each X and Z.
r"""Compute conditional P(Y | X, Z = z) for each X and Z.
Parameters
----------
Expand Down

0 comments on commit a780b04

Please sign in to comment.