Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Continuing CMI work #94

Closed
wants to merge 42 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
b1880dc
WIP
adam2392 Jun 21, 2023
c24413b
Adding new simulations
adam2392 Jun 23, 2023
b9daf5d
Merge branch 'main' into cmicont
adam2392 Jun 23, 2023
e8fb197
Adding conditional resampling
adam2392 Jun 26, 2023
0fecbf6
Merge branch 'cmicont' of https://github.com/neurodata/scikit-tree in…
adam2392 Jun 26, 2023
6dccbfa
Adding api docs
adam2392 Jun 26, 2023
58a2717
CMI analysis update
adam2392 Jun 30, 2023
a780b04
Merge branch 'main' into cmicont
adam2392 Jul 5, 2023
9647791
New fix style
adam2392 Jul 5, 2023
6d9ed28
erge branch 'cmicont' of https://github.com/neurodata/scikit-tree int…
adam2392 Jul 5, 2023
98123d0
Fix pyproject
adam2392 Jul 5, 2023
184657b
Merge branch 'main' into cmicont
adam2392 Jul 6, 2023
b2ad419
Remove extra png file
adam2392 Jul 6, 2023
00ebf5b
Add changelog
adam2392 Jul 6, 2023
5f5d970
Upgrade to cython 3.0
adam2392 Jul 19, 2023
502bb2d
Get working with sklearn main
adam2392 Jul 19, 2023
a49eb90
Updated wrt monotonic cst
adam2392 Jul 19, 2023
ae64831
Merge branch 'main' into cmicont
adam2392 Jul 19, 2023
c4e2df0
Update to new cython API
adam2392 Jul 20, 2023
28ede9e
Add categorical support
adam2392 Jul 20, 2023
f133467
Categorical support
adam2392 Jul 20, 2023
5d3bd20
Merge branch 'cmicont' of https://github.com/neurodata/scikit-tree in…
adam2392 Jul 20, 2023
5661778
WIP because categorical support made us lose accuracy
adam2392 Jul 20, 2023
7e8452a
Try again
adam2392 Jul 20, 2023
9f0e048
Update submodule
adam2392 Jul 20, 2023
3ad54f8
UPdated now wrt monotonic cst
adam2392 Jul 20, 2023
0154e9c
Remove fluff
adam2392 Jul 20, 2023
578fdc1
Update submodule
adam2392 Jul 20, 2023
0b077cf
Remove generated
adam2392 Jul 20, 2023
28df968
Update and fix docs
adam2392 Jul 20, 2023
3c1acaf
Fix changelog checker
adam2392 Jul 20, 2023
f55cb49
Fix doc location
adam2392 Jul 20, 2023
4a3cda5
Try redireict
adam2392 Jul 22, 2023
b1f0371
Try again
adam2392 Jul 24, 2023
029f28e
Try again
adam2392 Jul 24, 2023
f8fd478
Try again
adam2392 Jul 24, 2023
6ebd015
Rename
adam2392 Jul 24, 2023
6acc714
Rename
adam2392 Jul 24, 2023
362c367
Fix merge
adam2392 Aug 7, 2023
b4c029b
Fixed ci docs
adam2392 Aug 11, 2023
a10a549
Update submodule
adam2392 Aug 11, 2023
5e50021
Try to get partial fit working
adam2392 Aug 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 16 additions & 20 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,6 @@
# See: https://circleci.com/blog/deploying-documentation-to-github-pages-with-continuous-integration/
version: 2.1

# Aliases to reuse
_defaults: &defaults
docker:
# CircleCI maintains a library of pre-built images
# documented at https://circleci.com/docs/2.0/circleci-images/
- image: cimg/python:3.9

# document commands used by downstream jobs
commands:
check-skip:
Expand Down Expand Up @@ -70,7 +63,10 @@ commands:
jobs:
# Build scikit-tree from source
build_scikit_tree:
<<: *defaults
docker:
# CircleCI maintains a library of pre-built images
# documented at https://circleci.com/doc/2.0/circleci-images/
- image: cimg/python:3.9
steps:
- checkout
- check-skip
Expand Down Expand Up @@ -110,7 +106,10 @@ jobs:
- .

build_docs:
<<: *defaults
docker:
# CircleCI maintains a library of pre-built images
# documented at https://circleci.com/doc/2.0/circleci-images/
- image: cimg/python:3.9
steps:
- attach_workspace:
at: ~/
Expand All @@ -131,15 +130,15 @@ jobs:
python ./spin docs

- store_artifacts:
path: docs/_build/html
path: doc/_build/html
destination: dev

- store_artifacts:
path: docs/_build/html_stable/
path: doc/_build/html_stable/
destination: stable

- persist_to_workspace:
root: docs/_build
root: doc/_build
paths:
- html
- html_stable
Expand All @@ -151,7 +150,7 @@ jobs:
- checkout

- attach_workspace:
at: docs/_build
at: doc/_build

- restore_cache:
keys:
Expand All @@ -174,10 +173,10 @@ jobs:
command: |
if [ "${CIRCLE_BRANCH}" == "main" ]; then
echo "Deploying dev docs for ${CIRCLE_BRANCH}.";
gh-pages --dotfiles --message "docs updates [skip ci] (${CIRCLE_BUILD_NUM})" --dist docs/_build/html --dest ./dev
gh-pages --dotfiles --message "docs updates [skip ci] (${CIRCLE_BUILD_NUM})" --dist doc/_build/html --dest ./dev
else
echo "Deploying stable docs for ${CIRCLE_BRANCH}.";
gh-pages --dotfiles --message "docs updates [skip ci] (${CIRCLE_BUILD_NUM})" --dist docs/_build/html --dest ./stable
gh-pages --dotfiles --message "docs updates [skip ci] (${CIRCLE_BUILD_NUM})" --dist doc/_build/html --dest ./stable
fi;

- save_cache:
Expand All @@ -186,16 +185,13 @@ jobs:
- ~/sktree

workflows:
default:
build-docs:
jobs:
- build_scikit_tree:
name: build_scikit_tree
- build_scikit_tree
- build_docs:
name: build_docs
requires:
- build_scikit_tree
- docs-deploy:
name: docs-deploy
requires:
- build_docs
filters:
Expand Down
3 changes: 2 additions & 1 deletion .codespellignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
raison
nd
parth
parth
ot
6 changes: 3 additions & 3 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ exclude =
.pytest_cache
.circleci
paper
docs/_build
docs/generated
docs/auto_examples
doc/_build
doc/generated
doc/auto_examples
validation
build
build-install
Expand Down
18 changes: 10 additions & 8 deletions .github/workflows/circle_artifacts.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,27 @@
name: CircleCI artifacts redirector
on: [status]

permissions: read-all

# Restrict the permissions granted to the use of secrets.GITHUB_TOKEN in this
# github actions workflow:
# https://docs.github.com/en/actions/security-guides/automatic-token-authentication
permissions:
statuses: write

jobs:
circleci_artifacts_redirector_job:
runs-on: ubuntu-latest
if: "${{ github.event.context == 'ci/circleci: build_docs' }}"
permissions:
statuses: write
runs-on: ubuntu-20.04
if: "github.repository == 'neurodata/scikit-tree' && github.event.context == 'ci/circleci: build_docs'"
name: Run CircleCI artifacts redirector
steps:
- name: GitHub Action step
id: step1
uses: larsoner/circleci-artifacts-redirector-action@master
with:
api-token: ${{ secrets.CIRCLECI_TOKEN }}
repo-token: ${{ secrets.GITHUB_TOKEN }}
api-token: ${{ secrets.CIRCLE_TOKEN }}
artifact-path: 0/dev/index.html
circleci-jobs: build_docs
job-title: Check the rendered docs here!

- name: Check the URL
if: github.event.status != 'pending'
run: |
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/pr_checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ jobs:
then
exit 0
fi
all_changelogs=$(cat ./docs/whats_new/v*.rst)
all_changelogs=$(cat ./doc/whats_new/v*.rst)
if [[ "$all_changelogs" =~ :pr:\`$PR_NUMBER\` ]]
then
echo "Changelog has been updated."
# If the pull request is milestoned check the correspondent changelog
if exist -f ./docs/whats_new/v${TAGGED_MILESTONE:0:4}.rst
if exist -f ./doc/whats_new/v${TAGGED_MILESTONE:0:4}.rst
then
expected_changelog=$(cat ./docs/whats_new/v${TAGGED_MILESTONE:0:4}.rst)
expected_changelog=$(cat ./doc/whats_new/v${TAGGED_MILESTONE:0:4}.rst)
if [[ "$expected_changelog" =~ :pr:\`$PR_NUMBER\` ]]
then
echo "Changelog and milestone correspond."
Expand All @@ -58,7 +58,7 @@ jobs:
else
echo "A Changelog entry is missing."
echo ""
echo "Please add an entry to the changelog at 'docs/whats_new/v*.rst'"
echo "Please add an entry to the changelog at 'doc/whats_new/v*.rst'"
echo "to docsument your change assuming that the PR will be merged"
echo "in time for the next release of scikit-tree."
echo ""
Expand Down
16 changes: 8 additions & 8 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,17 @@ sktree/_lib/sklearn/
*.png

# Sphinx documentation
docs/_build/
docs/generated/
docs/auto_examples/
docs/auto_tutorials/
docs/modules/generated/
docs/sphinxext/cachedir
doc/_build/
doc/generated/
doc/auto_examples/
doc/auto_tutorials/
doc/modules/generated/
doc/sphinxext/cachedir
pip-log.txt
.coverage
tags
docs/coverages
docs/samples
doc/coverages
doc/samples
cover
examples/*.jpg

Expand Down
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[submodule "sktree/_lib/sklearn"]
path = sktree/_lib/sklearn_fork
url = https://github.com/neurodata/scikit-learn
branch = v1.3
branch = submodulev3
18 changes: 11 additions & 7 deletions .spin/cmds.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ def get_git_revision_hash(submodule) -> str:
@click.option("--build-dir", default="build", help="Build directory; default is `$PWD/build`")
@click.option("--clean", is_flag=True, help="Clean previously built docs before building")
@click.option("--noplot", is_flag=True, help="Build docs without plots")
def docs(build_dir, clean=False, noplot=False):
@click.pass_context
def docs(ctx, build_dir, clean=False, noplot=False):
"""📖 Build documentation"""
if clean:
doc_dir = "./docs/_build"
Expand All @@ -31,12 +32,13 @@ def docs(build_dir, clean=False, noplot=False):

util.run(["pip", "install", "-q", "-r", "doc_requirements.txt"])

os.environ["SPHINXOPTS"] = "-W"
os.environ["PYTHONPATH"] = f'{site_path}{os.sep}:{os.environ.get("PYTHONPATH", "")}'
if noplot:
util.run(["make", "-C", "docs", "clean", "html-noplot"], replace=True)
else:
util.run(["make", "-C", "docs", "clean", "html"], replace=True)
ctx.invoke(meson.docs)
# os.environ["SPHINXOPTS"] = "-W"
# os.environ["PYTHONPATH"] = f'{site_path}{os.sep}:{os.environ.get("PYTHONPATH", "")}'
# if noplot:
# util.run(["make", "-C", "docs", "clean", "html-noplot"], replace=True)
# else:
# util.run(["make", "-C", "docs", "clean", "html"], replace=True)


@click.command()
Expand All @@ -52,6 +54,8 @@ def coverage(ctx):
def setup_submodule(forcesubmodule=False):
"""Build scikit-tree using submodules.

git submodule set-branch -b submodulev2 sktree/_lib/sklearn

git submodule update --recursive --remote

To update submodule wrt latest commits:
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
[![Main](https://github.com/neurodata/scikit-tree/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/neurodata/scikit-tree/actions/workflows/main.yml)
[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)
[![codecov](https://codecov.io/gh/neurodata/scikit-tree/branch/main/graph/badge.svg?token=H1reh7Qwf4)](https://codecov.io/gh/neurodata/scikit-tree)
[![PyPI Download count](https://pepy.tech/badge/scikit-tree)](https://pepy.tech/project/scikit-tree)
[![Latest PyPI release](https://img.shields.io/pypi/v/scikit-tree.svg)](https://pypi.org/project/scikit-tree/)

scikit-tree
===========
Expand Down
2 changes: 1 addition & 1 deletion build_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
meson
meson-python
cython
cython>=3.0
ninja
numpy
scikit-learn>=1.3
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
32 changes: 31 additions & 1 deletion docs/api.rst → doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,30 @@ API Documentation
:no-members:
:no-inherited-members:

Scikit-learn Tree Estimators
----------------------------
We provide a drop-in replacement for the scikit-learn tree estimators
with **experimental** features that we have developed. These estimators
are still compatible with the scikit-learn API. These estimators all have
the capability of binning features, which theoretically will improve runtime
significantly for high-dimensional and high-sample size data.

Use at your own risk! We have not tested these estimators extensively, compared
to the scikit-learn estimators.

.. automodule:: sktree._lib.sklearn.ensemble
:members:
:show-inheritance:

.. currentmodule:: sktree
.. autosummary::
:toctree: generated/

RandomForestClassifier
RandomForestRegressor
ExtraTreesClassifier
ExtraTreesRegressor

Supervised
----------
Decision-tree models are traditionally implemented with axis-aligned splits and
Expand Down Expand Up @@ -84,7 +108,7 @@ provide a natural way to compute neighbors based on the splits. We provide
an API for extracting the nearest neighbors from a tree-model. This provides
an API-like interface similar to :class:`~sklearn.neighbors.NearestNeighbors`.

.. currentmodule:: sktree.neighbors
.. currentmodule:: sktree
.. autosummary::
:toctree: generated/

Expand Down Expand Up @@ -121,3 +145,9 @@ for the entropy, MI and CMI of the Gaussian distributions.
mi_gaussian
cmi_gaussian
entropy_gaussian

.. currentmodule:: sktree.experimental.monte_carlo
.. autosummary::
:toctree: generated/

conditional_resample
8 changes: 7 additions & 1 deletion docs/conf.py → doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,13 @@
)
sys.path.insert(0, os.path.abspath("sphinxext"))
import sktree
from sktree._lib.sklearn.ensemble._forest import ExtraTreesClassifier # noqa
from sktree._lib.sklearn.ensemble._forest import ExtraTreesRegressor # noqa
from sktree._lib.sklearn.ensemble._forest import RandomForestClassifier # noqa
from sktree._lib.sklearn.ensemble._forest import RandomForestRegressor # noqa

sys.path.append(os.path.abspath(os.path.join(curdir, "..", "sktree")))
sys.path.append(os.path.abspath(os.path.join(curdir, "..", "sktree/_lib")))

# -- project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
Expand All @@ -45,7 +50,7 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

# If your documentation needs a minimal Sphinx version, state it here.
needs_sphinx = "5.0"
needs_sphinx = "6.0"

# The document name of the “root” document, that is, the document that contains
# the root toctree directive.
Expand Down Expand Up @@ -166,6 +171,7 @@
"UnsupervisedObliqueRandomForest": "sktree.ensemble.UnsupervisedObliqueRandomForest",
"DecisionTreeClassifier": "sklearn.tree.DecisionTreeClassifier",
"DecisionTreeRegressor": "sklearn.tree.DecisionTreeRegressor",
"ExtraTreeRegressor": "sklearn.tree.ExtraTreeRegressor",
"pipeline.Pipeline": "sklearn.pipeline.Pipeline",
# "sklearn_fork.inspection.permutation_importance": "sklearn.inspection.permutation_importance",
}
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
20 changes: 6 additions & 14 deletions docs/modules/ensemble.rst → doc/modules/ensemble.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ sometimes at the cost of a slight increase in bias, oblique random forests aim t
They are motivated to construct even more diverse trees, thereby improving model generalization.
In practice the variance reduction is often significant hence yielding an overall better model.

In contrast to the original publication [B2001]_, the scikit-learn
In contrast to the original publication :footcite:`breiman2001random`, the scikit-learn
implementation allows the user to control the number of features to combine in computing
candidate splits. This is done via the ``feature_combinations`` parameter. For
more information and intuition, see
Expand All @@ -27,10 +27,7 @@ more information and intuition, see

.. topic:: References

.. [B2001] Breiman, L. "Random Forests", Machine Learning, 45(1), 5-32, 2001.

.. [G2006] Geurts, P. and Ernst., D. and Wehenkel, L. "Extremely randomized
trees", Machine Learning, 63(1), 3-42, 2006.
.. footbibliography::

.. _oblique_forest_feature_importance:

Expand All @@ -52,8 +49,8 @@ By **averaging** the estimates of predictive ability over several randomized
trees one can **reduce the variance** of such an estimate and use it
for feature selection. This is known as the mean decrease in impurity, or MDI.
Refer to [L2014]_ for more information on MDI and feature importance
evaluation with Random Forests. We implement the approach taken in [Li2023]_
and [Tomita2020]_.
evaluation with Random Forests. We implement the approach taken in :footcite:`Li2023manifold`
and :footcite:`TomitaSPORF2020`.

.. warning::

Expand All @@ -76,14 +73,9 @@ to the prediction function.

.. topic:: References

.. footbibliography::

.. [L2014] Louppe, G. :arxiv:`"Understanding Random Forests: From Theory to
Practice" <1407.7502>`,
PhD Thesis, U. of Liege, 2014.

.. [Li2023] Li, Adam, et al. :doi:`"Manifold Oblique Random Forests: Towards
Closing the Gap on Convolutional Deep Networks" <10.1137/21M1449117>`,
SIAM Journal on Mathematics of Data Science, 5(1), 77-96, 2023.

.. [Tomita2020] Tomita, Tyler M., et al. "Sparse Projection Oblique
Randomer Forests", The Journal of Machine Learning Research, 21(104),
1-39, 2020.
File renamed without changes.
Loading
Loading