Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] MI and CMI simulation module and relevant estimators from the forest #83

Merged
merged 19 commits into from
Jun 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ coverage
commit.txt
sktree/_lib/sklearn/

*.png

# Sphinx documentation
docs/_build/
docs/generated/
Expand Down
1 change: 1 addition & 0 deletions .spin/cmds.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ def setup_submodule(forcesubmodule=False):
commit_fpath,
],
)
print(commit_fpath)
with open(commit_fpath, "w") as f:
f.write(current_hash)

Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,10 @@ You can also do the same thing using Meson/Ninja itself. Run the following to bu
python -c "from sktree import tree"
python -c "import sklearn; print(sklearn.__version__);"

Alternatively, you can use editable installs

pip install --no-build-isolation --editable .

References
==========
[1]: [`Li, Adam, et al. "Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks." arXiv preprint arXiv:1909.11799 (2019)`](https://arxiv.org/abs/1909.11799)
56 changes: 56 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,59 @@ The trees that comprise those forests are also available as standalone classes.

tree.UnsupervisedDecisionTree
tree.UnsupervisedObliqueDecisionTree


Distance Metrics
----------------
Trees inherently produce a "distance-like" metric. We provide an API for
extracting pairwise distances from the trees that include a correction
that turns the "tree-distance" into a proper distance metric.

.. currentmodule:: sktree.tree
.. autosummary::
:toctree: generated/

compute_forest_similarity_matrix

In addition to providing a distance metric based on leaves, tree-models
provide a natural way to compute neighbors based on the splits. We provide
an API for extracting the nearest neighbors from a tree-model. This provides
an API-like interface similar to :class:`~sklearn.neighbors.NearestNeighbors`.

.. currentmodule:: sktree
.. autosummary::
:toctree: generated/

NearestNeighborsMetaEstimator


Experimental Functionality
--------------------------
We also include experimental functionality that is works in progress.

.. currentmodule:: sktree.experimental
.. autosummary::
:toctree: generated/

mutual_info_ksg

We also include functions that help simulate and evaluate mutual information (MI)
and conditional mutual information (CMI) estimators. Specifically, functions that
help simulate multivariate gaussian data and compute the analytical solutions
for the entropy, MI and CMI of the Gaussian distributions.

.. currentmodule:: sktree.experimental.simulate
.. autosummary::
:toctree: generated/

simulate_multivariate_gaussian
simulate_helix
simulate_sphere

.. currentmodule:: sktree.experimental.mutual_info
.. autosummary::
:toctree: generated/

mi_gaussian
cmi_gaussian
entropy_gaussian
16 changes: 16 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@
"PatchObliqueDecisionTreeClassifier": "sktree.tree.PatchObliqueDecisionTreeClassifier",
"ObliqueDecisionTreeRegressor": "sktree.tree.ObliqueDecisionTreeRegressor",
"PatchObliqueDecisionTreeRegressor": "sktree.tree.PatchObliqueDecisionTreeRegressor",
"UnsupervisedObliqueRandomForest": "sktree.ensemble.UnsupervisedObliqueRandomForest",
"DecisionTreeClassifier": "sklearn.tree.DecisionTreeClassifier",
"DecisionTreeRegressor": "sklearn.tree.DecisionTreeRegressor",
"pipeline.Pipeline": "sklearn.pipeline.Pipeline",
Expand Down Expand Up @@ -204,6 +205,19 @@
"_type_",
"MetadataRequest",
"~utils.metadata_routing.MetadataRequest",
"quantiles",
"n_quantiles",
"metric",
"n_queries",
"BaseForest",
"BaseDecisionTree",
"n_indexed",
"n_queries",
"n_features_x",
"n_features_y",
"n_features_z",
"n_neighbors",
"one",
}

# validation
Expand Down Expand Up @@ -354,6 +368,8 @@ def replace_sklearn_fork_with_sklearn(app, what, name, obj, options, lines):
# Use regular expressions to replace 'sklearn_fork' with 'sklearn'
content = re.sub(r"`pipeline.Pipeline", r"`~sklearn.pipeline.Pipeline", content)
content = re.sub(r"`~utils.metadata_routing.MetadataRequest", r"``MetadataRequest``", content)
content = re.sub(r"`np.quantile", r"`numpy.quantile", content)
content = re.sub(r"`~np.quantile", r"`numpy.quantile", content)

# Convert the modified string back to a list of lines
lines[:] = content.split("\n")
Expand Down
25 changes: 25 additions & 0 deletions docs/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,29 @@ @article{TomitaSPORF2020
number = {104},
pages = {1--39},
url = {http://jmlr.org/papers/v21/18-664.html}
}

@article{Darbellay1999Entropy,
title={Estimation of the Information by an Adaptive Partitioning of the Observation Space},
author={Georges A. Darbellay and Igor Vajda},
journal={IEEE Trans. Inf. Theory},
year={1999},
volume={45},
pages={1315-1321}
}

@article{Kraskov_2004,
title = {Estimating mutual information},
volume = {69},
url = {https://link.aps.org/doi/10.1103/PhysRevE.69.066138},
doi = {10.1103/PhysRevE.69.066138},
number = {6},
urldate = {2023-01-27},
journal = {Physical Review E},
author = {Kraskov, Alexander and Stögbauer, Harald and Grassberger, Peter},
month = jun,
year = {2004},
note = {Publisher: American Physical Society},
pages = {066138},
file = {APS Snapshot:/Users/adam2392/Zotero/storage/GRW23BYU/PhysRevE.69.html:text/html;Full Text PDF:/Users/adam2392/Zotero/storage/NJT9QCVA/Kraskov et al. - 2004 - Estimating mutual information.pdf:application/pdf}
}
2 changes: 2 additions & 0 deletions docs/whats_new/v0.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ Changelog
- |Feature| A general-kernel MORF is now implemented where users can pass in a kernel library, by `Adam Li`_ (:pr:`70`)
- |Feature| Implementation of ObliqueDecisionTreeRegressor, PatchObliqueDecisionTreeRegressor, ObliqueRandomForestRegressor, PatchObliqueRandomForestRegressor, by `SUKI-O`_ (:pr:`72`)
- |Feature| Implementation of HonestTreeClassifier, HonestForestClassifier, by `Sambit Panda`_, `Adam Li`_, `Ronan Perry`_ and `Haoyin Xu`_ (:pr:`57`)
- |Feature| Implementation of (conditional) mutual information estimation via unsupervised tree models and added NearestNeighborsMetaEstimator by `Adam Li`_ (:pr:`83`)


Code and Documentation Contributors
-----------------------------------
Expand Down
Loading
Loading