The MIMICNoFinding
, CheXpertNoFinding
, CXRMultisite
, and MIMICNotes
datasets are not able to be downloaded through scripts/download.py
as they require additional steps to gain access.
- Obtain access to the MIMIC-CXR-JPG Database on PhysioNet and download the dataset. We recommend downloading from the GCP bucket:
gcloud auth login
mkdir MIMIC-CXR-JPG
gsutil -m rsync -d -r gs://mimic-cxr-jpg-2.0.0.physionet.org MIMIC-CXR-JPG
-
In order to obtain demographic information for each patient, you will need to obtain access to MIMIC-IV. Download
core/patients.csv.gz
andcore/admissions.csv.gz
and place the files in theMIMIC-CXR-JPG
directory. -
Move or create a symbolic link to the
MIMIC-CXR-JPG
folder from your data directory. -
Run
python -m subpopbench.scripts.download mimic_cxr --data_path <data_path>
. -
(Optional) As the original jpgs have very high resolution, caching the images as downsampled copies might speed things up if you are training a lot of models. In this case, you should run
python -m subpopbench.scripts.cache_mimic_cxr --data_path <data_path>
.
-
Download the downsampled CheXpert dataset and extract it.
-
Register for an account and download the CheXpert demographics data here. Place the
CHEXPERT DEMO.xlsx
in your CheXpert directory. -
Move or create a symbolic link to the
CheXpert-v1.0-small
folder namedchexpert
in your data directory. -
Run
python -m subpopbench.scripts.download chexpert --data_path <data_path>
.
-
Acquire both of the datasets above.
-
Run
python -m subpopbench.scripts.download cxr_multisite --data_path <data_path>
.
-
Obtain access to the MIMIC-III Database on PhysioNet.
-
Follow the instructions here to load MIMIC-III into a PostgreSQL database.
-
Modify
scripts/preprocess_mimic_notes.py
to updateoutput_dir
with your data directory, as well as the database access credentials. Then, runpython -m subpopbench.scripts.preprocess_mimic_notes
. -
Run
python -m subpopbench.scripts.download mimic_notes --data_path <data_path>
.