Skip to content

Commit

Permalink
Add sampler to mortality prediction task to treat class imbalance (#521)
Browse files Browse the repository at this point in the history
  • Loading branch information
amrit110 authored Nov 29, 2023
1 parent 1f63621 commit 3a3f5e3
Show file tree
Hide file tree
Showing 6 changed files with 90 additions and 132 deletions.
25 changes: 18 additions & 7 deletions cyclops/models/wrappers/sk_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from sklearn.compose import ColumnTransformer
from sklearn.exceptions import NotFittedError
from sklearn.model_selection import GridSearchCV, PredefinedSplit, RandomizedSearchCV
from sklearn.pipeline import Pipeline

from cyclops.data.utils import is_out_of_core
from cyclops.models.utils import get_split, is_sklearn_class, is_sklearn_instance
Expand Down Expand Up @@ -304,18 +305,28 @@ def find_best( # noqa: PLR0912, PLR0915
[X[feature] for feature in feature_columns],
axis=1,
).squeeze()

if transforms is not None and not is_callable_transform:
try:
X_train = transforms.transform(X_train)
except NotFittedError:
X_train = transforms.fit_transform(X_train)

y_train = np.stack(
[X[target] for target in target_columns],
axis=1,
).squeeze()

if transforms is not None and not is_callable_transform:
try:
X_train = transforms.transform(X_train)
except (NotFittedError, AttributeError) as error:
if isinstance(error, AttributeError) and isinstance(
transforms,
Pipeline,
):
# Used for ImbalancedLearn transformer
X_train = transforms[0:-1].fit_transform(X_train)
X_train, y_train = transforms[-1].fit_resample(
X_train,
y_train,
)
else:
X_train = transforms.fit_transform(X_train)

if issparse(X_train):
X_train = X_train.toarray()

Expand Down
2 changes: 1 addition & 1 deletion docs/cyclops-webpage
158 changes: 37 additions & 121 deletions docs/source/intro.rst
Original file line number Diff line number Diff line change
@@ -1,36 +1,22 @@
.. figure::
https://github.com/VectorInstitute/cyclops/blob/main/docs/source/theme/static/cyclops_logo-dark.png?raw=true
.. figure:: https://github.com/VectorInstitute/cyclops/blob/main/docs/source/theme/static/cyclops_logo-dark.png?raw=true
:alt: cyclops Logo

--------------

|PyPI| |PyPI - Python Version| |code checks| |integration tests| |docs|
|codecov| |docker| |license|

``cyclops`` is a toolkit for facilitating research and deployment of ML
models for healthcare. It provides a few high-level APIs namely:

- ``data`` - Create datasets for training, inference and evaluation. We
use the popular 🤗
`datasets <https://github.com/huggingface/datasets>`__ to efficiently
load and slice different modalities of data
- ``models`` - Use common model implementations using
`scikit-learn <https://scikit-learn.org/stable/>`__ and
`PyTorch <https://pytorch.org/>`__
- ``tasks`` - Use common ML task formulations such as binary
classification or multi-label classification on tabular, time-series
and image data
|PyPI| |PyPI - Python Version| |code checks| |integration tests| |docs| |codecov| |docker| |license|

``cyclops`` is a toolkit for facilitating research and deployment of ML models for healthcare. It provides a few high-level APIs namely:

- ``data`` - Create datasets for training, inference and evaluation. We use the popular 🤗 `datasets <https://github.com/huggingface/datasets>`__ to efficiently load and slice different modalities of data
- ``models`` - Use common model implementations using `scikit-learn <https://scikit-learn.org/stable/>`__ and `PyTorch <https://pytorch.org/>`__
- ``tasks`` - Use common ML task formulations such as binary classification or multi-label classification on tabular, time-series and image data
- ``evaluate`` - Evaluate models on clinical prediction tasks
- ``monitor`` - Detect dataset shift relevant for clinical use cases
- ``report`` - Create `model report
cards <https://vectorinstitute.github.io/cyclops/api/tutorials/nihcxr/nihcxr_report_periodic.html>`__
for clinical ML models
- ``report`` - Create `model report cards <https://vectorinstitute.github.io/cyclops/api/tutorials/nihcxr/nihcxr_report_periodic.html>`__ for clinical ML models

``cyclops`` also provides example end-to-end use case implementations on
clinical datasets such as
``cyclops`` also provides example end-to-end use case implementations on clinical datasets such as

- `NIH chest
x-ray <https://www.nih.gov/news-events/news-releases/nih-clinical-center-provides-one-largest-publicly-available-chest-x-ray-datasets-scientific-community>`__
- `NIH chest x-ray <https://www.nih.gov/news-events/news-releases/nih-clinical-center-provides-one-largest-publicly-available-chest-x-ray-datasets-scientific-community>`__
- `MIMIC-IV <https://physionet.org/content/mimiciv/2.0/>`__

🐣 Getting Started
Expand All @@ -43,110 +29,46 @@ Installing cyclops using pip
python3 -m pip install pycyclops
``cyclops`` has many optional dependencies that are used for specific
functionality. For example, the
`monai <https://github.com/Project-MONAI/MONAI>`__ library is used for
loading DICOM images to create datasets. All optional dependencies can
be installed with ``pycyclops[all]``, and specific sets of dependencies
are listed in the sections below.

+-----------------------------+--------------------------+--------------+
| Dependency | pip extra | Notes |
+=============================+==========================+==============+
| xgboost | xgboost | Allows use |
| | | of |
| | | `XGBoos |
| | | t <https://x |
| | | gboost.readt |
| | | hedocs.io/en |
| | | /stable/>`__ |
| | | model |
+-----------------------------+--------------------------+--------------+
| torch | torch | Allows use |
| | | of |
| | | `PyTorch < |
| | | https://pyto |
| | | rch.org/>`__ |
| | | models |
+-----------------------------+--------------------------+--------------+
| torchvision | torchvision | Allows use |
| | | of |
| | | `T |
| | | orchvision < |
| | | https://pyto |
| | | rch.org/visi |
| | | on/stable/in |
| | | dex.html>`__ |
| | | library |
+-----------------------------+--------------------------+--------------+
| torchxrayvision | torchxrayvision | Uses |
| | | `TorchXR |
| | | ayVision <ht |
| | | tps://mlmed. |
| | | org/torchxra |
| | | yvision/>`__ |
| | | library |
+-----------------------------+--------------------------+--------------+
| monai | monai | Uses |
| | | `M |
| | | ONAI <https: |
| | | //github.com |
| | | /Project-MON |
| | | AI/MONAI>`__ |
| | | to load and |
| | | transform |
| | | images |
+-----------------------------+--------------------------+--------------+
| alibi | alibi | Uses |
| | | `Alibi <http |
| | | s://docs.sel |
| | | don.io/proje |
| | | cts/alibi/en |
| | | /stable/>`__ |
| | | for |
| | | additional |
| | | ex |
| | | plainability |
| | | f |
| | | unctionality |
+-----------------------------+--------------------------+--------------+
| alibi-detect | alibi-detect | Uses `Alibi |
| | | Detect |
| | | <https://doc |
| | | s.seldon.io/ |
| | | projects/ali |
| | | bi-detect/en |
| | | /stable/>`__ |
| | | for dataset |
| | | shift |
| | | detection |
+-----------------------------+--------------------------+--------------+
``cyclops`` has many optional dependencies that are used for specific functionality. For example, the `monai <https://github.com/Project-MONAI/MONAI>`__ library is used for loading DICOM images to create datasets. All optional dependencies can be installed with ``pycyclops[all]``, and specific sets of dependencies are listed in the sections below.

+-----------------------------+--------------------------+---------------------------------------------------------------------------------------------------------------+
| Dependency | pip extra | Notes |
+=============================+==========================+===============================================================================================================+
| xgboost | xgboost | Allows use of `XGBoost <https://xgboost.readthedocs.io/en/stable/>`__ model |
+-----------------------------+--------------------------+---------------------------------------------------------------------------------------------------------------+
| torch | torch | Allows use of `PyTorch <https://pytorch.org/>`__ models |
+-----------------------------+--------------------------+---------------------------------------------------------------------------------------------------------------+
| torchvision | torchvision | Allows use of `Torchvision <https://pytorch.org/vision/stable/index.html>`__ library |
+-----------------------------+--------------------------+---------------------------------------------------------------------------------------------------------------+
| torchxrayvision | torchxrayvision | Uses `TorchXRayVision <https://mlmed.org/torchxrayvision/>`__ library |
+-----------------------------+--------------------------+---------------------------------------------------------------------------------------------------------------+
| monai | monai | Uses `MONAI <https://github.com/Project-MONAI/MONAI>`__ to load and transform images |
+-----------------------------+--------------------------+---------------------------------------------------------------------------------------------------------------+
| alibi | alibi | Uses `Alibi <https://docs.seldon.io/projects/alibi/en/stable/>`__ for additional explainability functionality |
+-----------------------------+--------------------------+---------------------------------------------------------------------------------------------------------------+
| alibi-detect | alibi-detect | Uses `Alibi Detect <https://docs.seldon.io/projects/alibi-detect/en/stable/>`__ for dataset shift detection |
+-----------------------------+--------------------------+---------------------------------------------------------------------------------------------------------------+

🧑🏿‍💻 Developing
=======================

Using poetry
------------

The development environment can be set up using
`poetry <https://python-poetry.org/docs/#installation>`__. Hence, make
sure it is installed and then run:
The development environment can be set up using `poetry <https://python-poetry.org/docs/#installation>`__. Hence, make sure it is installed and then run:

.. code:: bash
python3 -m poetry install
source $(poetry env info --path)/bin/activate
In order to install dependencies for testing (codestyle, unit tests,
integration tests), run:
In order to install dependencies for testing (codestyle, unit tests, integration tests), run:

.. code:: bash
python3 -m poetry install --with test
API documentation is built using
`Sphinx <https://www.sphinx-doc.org/en/master/>`__ and can be locally
built by:
API documentation is built using `Sphinx <https://www.sphinx-doc.org/en/master/>`__ and can be locally built by:

.. code:: bash
Expand All @@ -157,32 +79,26 @@ built by:
Contributing
------------

Contributing to cyclops is welcomed. See
`Contributing <https://vectorinstitute.github.io/cyclops/api/intro.html>`__
for guidelines.
Contributing to cyclops is welcomed. See `Contributing <https://vectorinstitute.github.io/cyclops/api/intro.html>`__ for guidelines.

📚 `Documentation <https://vectorinstitute.github.io/cyclops/>`__
=================================================================

📓 Notebooks
============

To use jupyter notebooks, the python virtual environment can be
installed and used inside an IPython kernel. After activating the
virtual environment, run:
To use jupyter notebooks, the python virtual environment can be installed and used inside an IPython kernel. After activating the virtual environment, run:

.. code:: bash
python3 -m ipykernel install --user --name <name_of_kernel>
Now, you can navigate to the notebook’s ``Kernel`` tab and set it as
``<name_of_kernel>``.
Now, you can navigate to the notebook’s ``Kernel`` tab and set it as ``<name_of_kernel>``.

🎓 Citation
===========

Reference to cite when you use ``cyclops`` in a project or a research
paper:
Reference to cite when you use ``cyclops`` in a project or a research paper:

::

Expand Down
10 changes: 8 additions & 2 deletions docs/source/tutorials/mimiciv/mortality_prediction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
"from cycquery import MIMICIVQuerier\n",
"from datasets import Dataset\n",
"from datasets.features import ClassLabel\n",
"from imblearn.over_sampling import SMOTE\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.pipeline import Pipeline\n",
Expand Down Expand Up @@ -584,7 +585,12 @@
" ),\n",
" ],\n",
" remainder=\"passthrough\",\n",
")"
")\n",
"preprocessor_pipeline = [\n",
" (\"preprocessor\", preprocessor),\n",
" (\"oversampling\", SMOTE(random_state=RANDOM_SEED)),\n",
"]\n",
"preprocessor_pipeline = Pipeline(preprocessor_pipeline)"
]
},
{
Expand Down Expand Up @@ -692,7 +698,7 @@
"mortality_task.train(\n",
" dataset[\"train\"],\n",
" model_name=model_name,\n",
" transforms=preprocessor,\n",
" transforms=preprocessor_pipeline,\n",
" best_model_params=best_model_params,\n",
")"
]
Expand Down
26 changes: 25 additions & 1 deletion poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ torchmetrics = {version = "^1.2.0", extras = ["classification"]}
cupy = "^12.2.0"
mpi4py = {git = "https://github.com/mpi4py/mpi4py"}
lightning = "^2.1.0"
imbalanced-learn = "^0.11.0"

[tool.poetry.extras]
torch = ["torch"]
Expand Down

0 comments on commit 3a3f5e3

Please sign in to comment.