Skip to content

nhsengland/mm-healthfair

Repository files navigation

Understanding Fairness and Explainability in Multimodal Approaches within Healthcare

NHSE PhD Internship Project

About the Project

status: experimental PyPI - License pre-commit.ci status Ruff Poetry Code style: black

This repository holds code for the Understanding Fairness and Explainability in Multimodal Approaches within Healthcare project (MM-HealthFair). The MM-HealthFair framework was designed to support analysis of biases induced from routine healthcare data in risk prediction algorithms, providing an end-to-end pipeline for multimodal fusion, evaluation and fairness investigation. See the original project proposal for more information.

Note: Only public or fake data are shared in this repository.

Project Structure

  • The main code is found in the root of the repository (see Usage below for more information).
  • A summary of the key functionalities of the project is available on the index page.
  • Details on the last two project iterations are also available in the reports folder.
  • More information about the code usage can be found in the model card.

Built With

Python v3.10

In the latest iteration, the framework was developed locally using Python v3.10.11 and tested on a Windows 11 machine with GPU support (NVIDIA GeForce RTX 3080, 16 GiB VRAM). Additionally, model training and evaluation were performed on a Microsoft Azure machine using a Windows 10 Server with the following specifications:

  • 1 x NVIDIA Tesla T4 GPU
  • 4 x vCPUs (28 GiB memory)

Getting Started

Installation

To get a local copy up and running, follow these simple steps.

To clone the repo:

git clone https://github.com/nhsengland/mm-healthfair

To create a suitable environment:

  1. Use pip + requirements.txt
  • python -m venv _env
  • source _env/bin/activate
  • pip install -r requirements.txt
  1. Use poetry (recommended)
  • Install poetry (see website for documentation)
  • Navigate to project root directory cd mm-healthfair
  • Create environment from poetry lock file: poetry install
  • Run scripts using poetry run python3 xxx.py

Note: There are known issues when installing the scispacy package for Python versions >3.10 or Apple M1 chips. Project dependencies strictly require py3.10 to avoid this, however OSX users may need to manually install nmslib with CFLAGS="-mavx -DWARN(a)=(a)" pip install nmslib to circumvent this issue (see open issue nmslib/nmslib#476).

Note: To enable support for platforms with CPU-only compute units, you should remove the source="pytorch-gpu" arguments from pyproject.toml before installing the PyTorch libraries.

Usage

This repository contains code used to generate and evaluate multimodal deep learning pipelines for risk prediction using demographic, time-series and clinical notes data from MIMIC-IV v3.1. Additionally, it includes functionalities for adversarial mitigation (controlling model dependence on sensitive attributes), fairness analysis with bootstrapping and explainability using SHAP and MM-SHAP scores for examining multimodal feature importance.

To reproduce the experiments, refer to the Getting Started page for a detailed walkthrough.

Outputs

  • Preprocessed multimodal features from MIMIC-IV 3.1 and related dictionaries.
  • Multimodal learner artifacts (model checkpoints).
  • Performance, fairness and explainability summaries mapped by artifact name (coded as <outcome>_<fusion_type>_<modalities>, e.g. ext_stay_7_concat_static_timeseries_notes).
  • Notebooks for debugging, inference relative to the generated dictionary files throughout the pipeline.

Datasets

The MIMIC-IV dataset (v3.1) can be downloaded from PhysioNet.org after completion of mandatory training. This project makes use of four main modules linked to the MIMIC-IV dataset:

  • hosp: measurements recorded during hospital stay for training, including demographics, lab tests, prescriptions, diagnoses and care provider orders
  • ed: records metadata during ED attendance in an externally linked database
  • icu: records individuals with associated ICU admission during the episode with additional metadata (used mainly for measuring the ICU admission outcome)
  • note: records deidentified discharge summaries as long form narratives which describe reason for admission and relevant hospital events

Additional linked datasets include MIMIC-IV-ED (v2.2), MIMIC-IV-Note (v2.2) and MIMIC-IV-Ext-BHC (v1.2.0) as an external dataset for extracting Brief Hospital Course segments within a discharge summary. Further information can be found in PhysioNet's documentation.

Roadmap

See the repo issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

See CONTRIBUTING.md for detailed guidance.

License

Unless stated otherwise, the codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.

See LICENSE for more information.

The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.

Contact

To find out more about the Analytics Unit visit our project website or get in touch at england.tdau@nhs.net.

About

NHSE PhD Internship Project - P61: Understanding Fairness and Explainability in Multi-modal Approaches within Healthcare

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5