Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Update documentation #299

Merged
merged 6 commits into from
Jul 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 131 additions & 7 deletions DEVELOPING.md
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SUKI-O this is where it would be great to hear your thoughts on this workflow.

Is it confusing? Could it be improved?

Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,22 @@
- [Requirements](#requirements)
- [Setting up your development environment](#setting-up-your-development-environment)
- [Building the project from source](#building-the-project-from-source)
- [Summary: Building locally with Meson For developers](#summary-building-locally-with-meson-for-developers)
- [Development Tasks](#development-tasks)
- [Advanced Updating submodules](#advanced-updating-submodules)
- [Cython and C++](#cython-and-c)
- [Making a Release](#making-a-release)
- [Releasing on PyPi for pip installs](#releasing-on-pypi-for-pip-installs)
- [Releasing documentation](#releasing-documentation)

<!-- /TOC -->

# Requirements

- Python 3.9+
- numpy>=1.25
- scipy>=1.11
- scikit-learn>=1.3.1
- numpy>=1.25.0
- scipy>=1.5.0
- scikit-learn>=1.4.1

For the other requirements, inspect the ``pyproject.toml`` file.

Expand Down Expand Up @@ -70,21 +73,82 @@ For other commands, see

Note at this stage, you will be unable to run Python commands directly. For example, ``pytest ./treeple`` will not work.

However, after installing and building the project from source using meson, you can leverage editable installs to make testing code changes much faster. For more information on meson-python's progress supporting editable installs in a better fashion, see <https://meson-python.readthedocs.io/en/latest/how-to-guides/editable-installs.html>.
However, after installing and building the project from source using meson, you can leverage editable installs to make testing code changes much faster.

pip install --no-build-isolation --editable .
spin install

**Note: editable installs for treeple REQUIRE you to have built the project using meson already.** This will now link the meson build to your Python runtime. Now if you run
This will now link the meson build to your Python runtime. Now if you run

pytest ./treeple

the unit-tests should run.

Summary: Building locally with Meson (For developers)
-----------------------------------------------------
Make sure you have the necessary packages installed.

# install build dependencies
pip install -r build_requirements.txt

# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

``YOUR_PYTHON_VERSION`` below should be any of the acceptable versions of Python for treeple. We use the ``spin`` CLI to abstract away build details:

# run the build using Meson/Ninja
./spin build

# you can run the following command to see what other options there are
./spin --help
./spin build --help

# For example, you might want to start from a clean build
./spin build --clean

# or build in parallel for faster builds
./spin build -j 2

# you will need to double check the build-install has the proper path
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python<YOUR_PYTHON_VERSION>/site-packages

# run specific unit tests
./spin test -- treeple/tree/tests/test_tree.py

# you can bring up the CLI menu
./spin --help

You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:

# generate ninja make files
meson build --prefix=$PWD/build

# compile
ninja -C build

# install treeple package
meson install -C build

export PYTHONPATH=${PWD}/build/lib/python<YOUR_PYTHON_VERSION>/site-packages

# to check installation, you need to be in a different directory
cd docs;
python -c "from treeple import tree"
python -c "import sklearn; print(sklearn.__version__);"

After building locally, you can use editable installs (warning: this only registers Python changes locally)

pip install --no-build-isolation --editable .

Or if you have spin v0.8+ installed, you can just run directly

spin install

# Development Tasks

There are a series of top-level tasks available.

make run-checks
make pre-commit

This leverage pre-commit to run a series of precommit checks.

Expand Down Expand Up @@ -115,6 +179,8 @@ In order to develop new tree models, generally Cython and C++ code will need to

treeple is in-line with scikit-learn and thus relies on each new version released there. Moreover, treeple relies on compiled code, so releases are a bit more complex than the typical Python package.

## Releasing on PyPi (for pip installs)

1. Download wheels from GH Actions and put all wheels into a ``dist/`` folder

<https://github.com/neurodata/treeple/actions/workflows/build_wheels.yml> will have all the wheels for common OSes built for each Python version.
Expand All @@ -140,3 +206,61 @@ or if you have two-factor authentication enabled: <https://pypi.org/help/#apitok
4. Update version number on ``meson.build`` and ``pyproject.toml`` to the relevant version.

See https://github.com/neurodata/treeple/pull/160 as an example.

## Releasing documentation

1. Build the documentation locally

```
spin docs
```

2. Make a copy of the documentation in the ``docs/_build/html`` folder somewhere outside of the git folder.

3. Push the documentation to the ``gh-pages`` branch

```
git checkout gh-pages
```

Rename the current ``stable`` folder to the version number of the previous release, e.g. If we are releasing ``0.8.0``, then rename the ``stable`` folder to ``0.7.0``.

Copy the contents of the ``docs/_build/html`` folder to the root of the ``gh-pages`` branch under the `stable` folder, since this new release is the "stable" version.

4. Update the versions pointer file in main `doc/_static/versions.json` to point to the new version.

e.g. If we are releasing ``0.8.0``, then you will see:

```
{
"name": "0.7",
"version": "stable",
"url": "https://docs.neurodata.io/treeple/stable/"
},
```

which should get renamed to its corresponding version number:

```
{
"name": "0.7",
"version": "0.7",
"url": "https://docs.neurodata.io/treeple/v0.7/"
},
```

Similarly, we will add pointers to the development version and new stable v0.8 version:

```
{
"name": "0.9 (devel)",
"version": "dev",
"url": "https://docs.neurodata.io/treeple/dev/"
},
{
"name": "0.8",
"version": "stable",
"url": "https://docs.neurodata.io/treeple/stable/"
},
```

71 changes: 8 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ Documentation

See here for the documentation for our dev version: <https://docs.neurodata.io/treeple/dev/index.html>

Is treeple useful for me?
=========================

1. If you use decision tree models (random forest, extra trees, isolation forests, etc.) in your work, treeple is a good package to try out. We have a variety of better tree models that are not available in scikit-learn, and we are always looking for new tree models to implement. For example, oblique decision trees are in general better than their axis-aligned counterparts.

2. If you are interested in extending the decision tree API in scikit-learn, treeple is a good package to try out. We have a variety of internal APIs that are not available in scikit-learn, and are able to support new decision tree models easier.

Why oblique trees and why trees beyond those in scikit-learn?
=============================================================

Expand Down Expand Up @@ -48,72 +55,10 @@ Installing with pip on a conda environment is the recommended route.

pip install treeple

Building locally with Meson (For developers)
--------------------------------------------

Make sure you have the necessary packages installed

# install build dependencies
pip install -r build_requirements.txt

# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

We use the ``spin`` CLI to abstract away build details:

# run the build using Meson/Ninja
./spin build

# you can run the following command to see what other options there are
./spin --help
./spin build --help

# For example, you might want to start from a clean build
./spin build --clean

# or build in parallel for faster builds
./spin build -j 2

# you will need to double check the build-install has the proper path
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python3.9/site-packages

# run specific unit tests
./spin test -- treeple/tree/tests/test_tree.py

# you can bring up the CLI menu
./spin --help

You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:

# generate ninja make files
meson build --prefix=$PWD/build

# compile
ninja -C build

# install treeple package
meson install -C build

export PYTHONPATH=${PWD}/build/lib/python3.9/site-packages

# to check installation, you need to be in a different directory
cd docs;
python -c "from treeple import tree"
python -c "import sklearn; print(sklearn.__version__);"

After building locally, you can use editable installs (warning: this only registers Python changes locally)

pip install --no-build-isolation --editable .

Or if you have spin v0.8+ installed, you can just run directly

spin install

Development
===========

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.
We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. We also will welcome contributions in C/C++ if they improve the extensibility, or runtime performance of the codebase. Our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a fork of scikit-learn at <https://github.com/neurodata/scikit-learn> when
extending the decision tree model API of scikit-learn. Specifically, we extend the Python and Cython API of the tree submodule in scikit-learn in our submodule, so we can introduce the tree models housed in this package. Thus these extend the functionality of decision-tree based models in a way that is not possible yet in scikit-learn itself. As one example, we introduce an abstract API to allow users to implement their own oblique splits. Our plan in the future is to benchmark these functionalities and introduce them upstream to scikit-learn where applicable and inclusion criterion are met.
Expand Down
2 changes: 1 addition & 1 deletion doc/_static/versions.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
},
{
"name": "0.8",
"version": "dev",
"version": "stable",
"url": "https://docs.neurodata.io/treeple/stable/"
},
{
Expand Down
13 changes: 9 additions & 4 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@ learning problems. It extends the robust API of `scikit-learn <https://github.co
for tree algorithms that achieve strong performance in benchmark tasks.

Our package has implemented unsupervised forests (Geodesic Forests
[Madhyastha2020]_), oblique random forests (SPORF [Tomita2020]_ and
MORF [Li2023]_), and honest forests [Perry2021]_.
In the near future, we also plan to include extended isolation forests
and stream decision forests [Xu2022]_.
[Madhyastha2020]_), oblique random forests (SPORF [Tomita2020]_, manifold random forests,
MORF [Li2023]_), honest forests [Perry2021]_, extended isolation forests [Hariri2019]_, and more.

For all forests, we also support incremental building of the forests, using the
``partial_fit`` API from scikit-learn [Xu2022]_.

We encourage you to use the package for your research and also build on top
with relevant Pull Requests. See our examples for walk-throughs of how to use the package.
Expand All @@ -18,6 +19,10 @@ We are licensed under BSD-3 (see `License <https://github.com/neurodata/treeple/

.. topic:: References

.. [Hariri2019] Hariri, Sahand, Matias Carrasco Kind, and Robert J. Brunner.
"Extended isolation forest." IEEE transactions on knowledge and data
engineering 33.4 (2019): 1479-1489.

.. [Madhyastha2020] Madhyastha, Meghana, et al. :doi:`"Geodesic Forests"
<10.1145/3394486.3403094>`, KDD 2020, 513-523, 2020.

Expand Down
2 changes: 1 addition & 1 deletion treeple/_lib/sklearn_fork
Submodule sklearn_fork updated 143 files
Loading