Understanding Fairness and Explainability in Multimodal Approaches within Healthcare

NHSE PhD Internship Project

About the Project

This repository holds code for the Understanding Fairness and Explainability in Multimodal Approaches within Healthcare project (MM-HealthFair). The MM-HealthFair framework was designed to support analysis of biases induced from routine healthcare data in risk prediction algorithms, providing an end-to-end pipeline for multimodal fusion, evaluation and fairness investigation. See the original project proposal for more information.

Note: Only public or fake data are shared in this repository.

Project Structure

The main code is found in the root of the repository (see Usage below for more information).
A summary of the key functionalities of the project is available on the index page.
Details on the last two project iterations are also available in the reports folder.
More information about the code usage can be found in the model card.

Built With

In the latest iteration, the framework was developed locally using Python v3.10.11 and tested on a Windows 11 machine with GPU support (NVIDIA GeForce RTX 3080, 16 GiB VRAM). Additionally, model training and evaluation were performed on a Microsoft Azure machine using a Windows 10 Server with the following specifications:

1 x NVIDIA Tesla T4 GPU
4 x vCPUs (28 GiB memory)

Getting Started

Installation

To get a local copy up and running, follow these simple steps.

To clone the repo:

git clone https://github.com/nhsengland/mm-healthfair

To create a suitable environment:

Use pip + requirements.txt

python -m venv _env
source _env/bin/activate
pip install -r requirements.txt

Use poetry (recommended)

Install poetry (see website for documentation)
Navigate to project root directory cd mm-healthfair
Create environment from poetry lock file: poetry install
Run scripts using poetry run python3 xxx.py

Note: There are known issues when installing the scispacy package for Python versions >3.10 or Apple M1 chips. Project dependencies strictly require py3.10 to avoid this, however OSX users may need to manually install nmslib with CFLAGS="-mavx -DWARN(a)=(a)" pip install nmslib to circumvent this issue (see open issue nmslib/nmslib#476).

Note: To enable support for platforms with CPU-only compute units, you should remove the source="pytorch-gpu" arguments from pyproject.toml before installing the PyTorch libraries.

Usage

This repository contains code used to generate and evaluate multimodal deep learning pipelines for risk prediction using demographic, time-series and clinical notes data from MIMIC-IV v3.1. Additionally, it includes functionalities for adversarial mitigation (controlling model dependence on sensitive attributes), fairness analysis with bootstrapping and explainability using SHAP and MM-SHAP scores for examining multimodal feature importance.

To reproduce the experiments, refer to the Getting Started page for a detailed walkthrough.

Outputs

Preprocessed multimodal features from MIMIC-IV 3.1 and related dictionaries.
Multimodal learner artifacts (model checkpoints).
Performance, fairness and explainability summaries mapped by artifact name (coded as <outcome>_<fusion_type>_<modalities>, e.g. ext_stay_7_concat_static_timeseries_notes).
Notebooks for debugging, inference relative to the generated dictionary files throughout the pipeline.

Datasets

The MIMIC-IV dataset (v3.1) can be downloaded from PhysioNet.org after completion of mandatory training. This project makes use of four main modules linked to the MIMIC-IV dataset:

hosp: measurements recorded during hospital stay for training, including demographics, lab tests, prescriptions, diagnoses and care provider orders
ed: records metadata during ED attendance in an externally linked database
icu: records individuals with associated ICU admission during the episode with additional metadata (used mainly for measuring the ICU admission outcome)
note: records deidentified discharge summaries as long form narratives which describe reason for admission and relevant hospital events

Additional linked datasets include MIMIC-IV-ED (v2.2), MIMIC-IV-Note (v2.2) and MIMIC-IV-Ext-BHC (v1.2.0) as an external dataset for extracting Brief Hospital Course segments within a discharge summary. Further information can be found in PhysioNet's documentation.

Roadmap

See the repo issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

See CONTRIBUTING.md for detailed guidance.

License

Unless stated otherwise, the codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.

See LICENSE for more information.

Contact

To find out more about the Analytics Unit visit our project website or get in touch at england.tdau@nhs.net.

Name		Name	Last commit message	Last commit date
Latest commit History 424 Commits
.github		.github
config		config
docs		docs
notebooks		notebooks
outputs		outputs
report		report
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CODUCT.md		CODE_OF_CODUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENCE		LICENCE
OPEN_CODE_CHECKLIST.md		OPEN_CODE_CHECKLIST.md
README.md		README.md
mkdocs.yml		mkdocs.yml
model_card.md		model_card.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Understanding Fairness and Explainability in Multimodal Approaches within Healthcare

NHSE PhD Internship Project

About the Project

Project Structure

Built With

Getting Started

Installation

Usage

Outputs

Datasets

Roadmap

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

nhsengland/mm-healthfair

Folders and files

Latest commit

History

Repository files navigation

Understanding Fairness and Explainability in Multimodal Approaches within Healthcare

NHSE PhD Internship Project

About the Project

Project Structure

Built With

Getting Started

Installation

Usage

Outputs

Datasets

Roadmap

Contributing

License

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages