This repository holds code for the Understanding Fairness and Explainability in Multimodal Approaches within Healthcare project (MM-HealthFair). The MM-HealthFair framework was designed to support analysis of biases induced from routine healthcare data in risk prediction algorithms, providing an end-to-end pipeline for multimodal fusion, evaluation and fairness investigation. See the original project proposal for more information.
Note: Only public or fake data are shared in this repository.
- The main code is found in the root of the repository (see Usage below for more information).
- A summary of the key functionalities of the project is available on the index page.
- Details on the last two project iterations are also available in the reports folder.
- More information about the code usage can be found in the model card.
In the latest iteration, the framework was developed locally using Python v3.10.11 and tested on a Windows 11 machine with GPU support (NVIDIA GeForce RTX 3080, 16 GiB VRAM). Additionally, model training and evaluation were performed on a Microsoft Azure machine using a Windows 10 Server with the following specifications:
- 1 x NVIDIA Tesla T4 GPU
- 4 x vCPUs (28 GiB memory)
To get a local copy up and running, follow these simple steps.
To clone the repo:
git clone https://github.com/nhsengland/mm-healthfair
To create a suitable environment:
- Use pip + requirements.txt
python -m venv _env
source _env/bin/activate
pip install -r requirements.txt
- Use poetry (recommended)
- Install poetry (see website for documentation)
- Navigate to project root directory
cd mm-healthfair
- Create environment from poetry lock file:
poetry install
- Run scripts using
poetry run python3 xxx.py
Note: There are known issues when installing the scispacy package for Python versions >3.10 or Apple M1 chips. Project dependencies strictly require py3.10 to avoid this, however OSX users may need to manually install nmslib with CFLAGS="-mavx -DWARN(a)=(a)" pip install nmslib
to circumvent this issue (see open issue nmslib/nmslib#476).
Note: To enable support for platforms with CPU-only compute units, you should remove the source="pytorch-gpu"
arguments from pyproject.toml
before installing the PyTorch libraries.
This repository contains code used to generate and evaluate multimodal deep learning pipelines for risk prediction using demographic, time-series and clinical notes data from MIMIC-IV v3.1. Additionally, it includes functionalities for adversarial mitigation (controlling model dependence on sensitive attributes), fairness analysis with bootstrapping and explainability using SHAP and MM-SHAP scores for examining multimodal feature importance.
To reproduce the experiments, refer to the Getting Started page for a detailed walkthrough.
- Preprocessed multimodal features from MIMIC-IV 3.1 and related dictionaries.
- Multimodal learner artifacts (model checkpoints).
- Performance, fairness and explainability summaries mapped by artifact name (coded as
<outcome>_<fusion_type>_<modalities>
, e.g.ext_stay_7_concat_static_timeseries_notes
). - Notebooks for debugging, inference relative to the generated dictionary files throughout the pipeline.
The MIMIC-IV dataset (v3.1) can be downloaded from PhysioNet.org after completion of mandatory training. This project makes use of four main modules linked to the MIMIC-IV dataset:
- hosp: measurements recorded during hospital stay for training, including demographics, lab tests, prescriptions, diagnoses and care provider orders
- ed: records metadata during ED attendance in an externally linked database
- icu: records individuals with associated ICU admission during the episode with additional metadata (used mainly for measuring the ICU admission outcome)
- note: records deidentified discharge summaries as long form narratives which describe reason for admission and relevant hospital events
Additional linked datasets include MIMIC-IV-ED (v2.2), MIMIC-IV-Note (v2.2) and MIMIC-IV-Ext-BHC (v1.2.0) as an external dataset for extracting Brief Hospital Course segments within a discharge summary. Further information can be found in PhysioNet's documentation.
See the repo issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
See CONTRIBUTING.md for detailed guidance.
Unless stated otherwise, the codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.
See LICENSE for more information.
The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.
To find out more about the Analytics Unit visit our project website or get in touch at england.tdau@nhs.net.