Python repository for ICU-TSB: A Benchmark for Temporal Patient Representation Learning for Unsupervised Stratification into Patient Cohorts
- Submitted and accepted in CBMS2025 - more info @ https://2025.cbms-conference.org/
You need to download and have set up Physionet access (for which you need the CITI certificate), for each fo the ICU datasets (MIMIC-IC, eICU and SiCDB) with rICU preprocessing https://github.com/eth-mds/ricu (required R).
Note: unfortunatly the units R package library is only working in Windows hence at least for the step_1.R script you will need to Windows.
The rest (everything apart from step1.r) should be run in Linux or MacOSX.
That said we are actively working in rewritiing the library in python/linux using a fork of rICU.
- Rscript --version Rscript (R) version 4.3.2 (2023-10-31)
- python --version Python 3.10.12
Rscript preprocessing/x_prep/step1.r --dataset mimic_demo python -m preprocessing.x_prep.step2_impute --dataset mimic_demo python -m preprocessing.x_prep.step3_encoding --dataset mimic_demo python -m preprocessing.x_prep.step4_normalize --dataset mimic_demo python -m preprocessing.x_prep.step5_group --dataset mimic_demo
We provide code for both multi/single channel LSTM - however singleLSTM is significantly less GPU-memory hungry allowing its training in smaller GPUs.
- STAT python -m models.stats --dataset mimic_demo
- singleLSTM python -m models.lstmv5 --dataset mimic_demo --mode train --max_steps 10000 --max_patients 10000 --learning_rate 5e-5 --batch_size 2 --timeseries_model singleLSTM
- GRU python -m models.lstmv5 --dataset mimic_demo --mode train --max_steps 10000 --max_patients 10000 --learning_rate 5e-5 --batch_size 2 --timeseries_model singleLSTM --gru
This step uses vector representation from STAT LSTM and GRU
-
STAT python -m experiments.IR12 --dataset mimic_demo --trials 10 --optimize --mode stats --fpath data/embeddings/stats_mimic_demo/stats_test_mimic_demo_patient_embeddings.csv
-
GRU python -m experiments.IR12 --dataset mimic_demo --trials 10 --optimize --mode gru --fpath data/embeddings/gru_train_mimic_demo_e_10_ms_10000_samples_10000__bs_2.shelve
-
LSTM python -m experiments.IR12 --dataset mimic_demo --trials 10 --optimize --mode lstm --fpath data/embeddings/singleLSTM_train_mimic_demo_e_10_ms_10000_samples_10000__bs_2.shelve