Contributors: Zhenyu Xiao*, Haobin Zhou*, Yimeng Xu, and Emma Cardenas.
Affiliation: Department of Biomedical Engineering, Johns Hopkins University
This repository is part of the Biomedical Data Design course, where we focus on tracking patient recovery in real time by processing streaming data. The primary data source for this project is the eICU Collaborative Research Database, accessible on their official website after completing the required course on data security and ethics.
Included here are the source code, weekly presentation slides, and additional resources necessary to understand and engage with our project.
This project is written in Python 3. You can run this project online using Colaboratory and upload your data to your Google Drive, or run it on your local machine.
When using Google Colaboratory, most of the CSV files will be generated in the directory 'My Drive/Colab Notebooks', only the model input data will be stored in 'Stream/Models' automatically.
To clone the Github Repository into Google Colaboratory, click and run this link. You will create a folder named 'Stream' in your Google Drive.
Upload the unzipped eICU files to your Google Drive under 'My Drive/EICU/eicu-collaborative-research-database-2.0'.
Run the file 'Stream/Preprocess/func_check_patient_num.ipynb'. This notebook will filter patients with unavailable features and generate the file 'Final_available_patients.csv'.
Run these notebooks in 'Stream/Preprocess' to extract data and interpolate with GPR (except 'Patient_Results.ipynb'):
- BloodPressure.ipynb
- Glasgow.ipynb
- HeartRate.ipynb
- Pao2fio2-fio2.ipynb
- Pao2fio2-pao2.ipynb
- Temp.ipynb
- Urine.ipynb
- lab1_BUN.ipynb
- lab2_WBC.ipynb
- lab3_bicarbonate.ipynb
- lab4_sodium.ipynb
- lab5_potassuim.ipynb
- lab6_bilirubin.ipynb
- Patient_Results.ipynb
This step will be time-consuming if using the whole eICU database.
We recommend running these notebooks parallelly in different Colab sessions to save time.
Run 'Stream/Preprocess/Organize_all_data.ipynb' to merge all the features into '13features.csv'.
As written in Section 4 of the paper, we used 3 machine learning models and 1 deep learning model.
For Machine Learning models, run the notebook 'Stream/Preprocessing/ml_models.ipynb' or use this file, then run the notebook 'Stream/Models/ml_models.ipynb' to evaluate the performance of ML models.
For LSTM, run the notebook 'Stream/Preprocessing/Balance_LSTM.ipynb' to generate the balanced data or use the files in this link, then run the notebook 'Stream/Models/LSTM.ipynb' to evaluate the performance of LSTM (you may use GPU to accelerate this final process).
To run the project locally:
- Adjust file paths in the code to your local directories.
- Set up your environment using Anaconda and CUDA as needed. See the installation guide below for details.
Here we provide an example of how to install the environment on a local machine using Anaconda and CUDA 11.8. For non-GPU & other CUDA version installation, please refer to the PyTorch website when installing PyTorch. We remark that this repository does not depend on a specific CUDA version, feel free to use any CUDA version suitable on your own computer.
# create conda environment
conda create -n bdd python=3.9 -y
conda activate bdd
conda install numpy pandas matplotlib scikit-learn xgboost jupyter pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
Even though here we provide 'environment.yaml', it may have redundancy. We recommend you try to install other required packages by running the code and finding which required package hasn't been installed yet.