ECG Preprocessing

Installation

if git clone and uv installed, just cd into the repo and uv sync.
To run just do uv run $PATH_TO_FILE. There are examples in scripts/

Base Datasets

We regard base datasets as datasets that are solely used for later mapping of external datasets. Note that DATA_DIR in src/configs/constants.py is the path to your data folder. The structure in which the data folder should be in is the following:

data
├── csn
│   ├── preprocessed_1250
│   ├── preprocessed_500
│   └── preprocessed_2500
├── cpsc
│   └── ...
├── ptb_xl
│   └── ...
├── mimic_iv
│   └── ...
└── code15
    └── ...

These base datasets are enough if you want to solely use the ECG datasets for pretraining with https://github.com/ELM-Research/ecg_nn or finetuning an ELM with https://github.com/ELM-Research/ELM. The output of the base dataset preprocessing pipeline is a folder with .npy files that have the ECG signal matrix and a textual report if available.

PTB-XL

Please download the PTB-XL dataset through this link.
Please create a data folder, unzip the zip file inside the data folder and rename the folder as ptb_xl.

MIMIC

Please download the Mimic IV ECG dataset through this link.
Unzip the zip file inside the data directory and rename the unzipped directory as mimic_iv.

Code-15

First create a code15 folder inside the data directory.
Then inside data/code15 execute the following bash script to download the data and unzip it:

#!/bin/bash

for i in {0..17}; do
    echo "Downloading part ${i}..."
    wget -O "exams_part${i}.zip" "https://zenodo.org/records/4916206/files/exams_part${i}.zip?download=1"
    
    if [ $? -eq 0 ]; then
        echo "Successfully downloaded part ${i}"
        
        echo "Extracting part ${i}..."
        unzip -q "exams_part${i}.zip"
        
        if [ $? -eq 0 ]; then
            echo "Successfully extracted part ${i}"
            rm "exams_part${i}.zip"
        else
            echo "Error extracting part ${i}"
        fi
    else
        echo "Error downloading part ${i}"
    fi
done

echo "All downloads and extractions completed"

CSN

Create a csn folder inside the data directory.
Inside data/csn execute the following command in the terminal:

wget https://physionet.org/static/published-projects/ecg-arrhythmia/a-large-scale-12-lead-electrocardiogram-database-for-arrhythmia-study-1.0.0.zip

Unzip the file and inside of data/csn/a-large-scale-12-lead-electrocardiogram-database-for-arrhythmia-study-1.0.0 move all of the contents outside to data/csn. Then you may delete the a-large-scale-12-lead-electrocardiogram-database-for-arrhythmia-study-1.0.0 folder.

CPSC

Create a cpsc folder inside the data directory.
Inside data/cpsc execute the following command in the terminal:

wget https://physionet.org/static/published-projects/challenge-2020/classification-of-12-lead-ecgs-the-physionetcomputing-in-cardiology-challenge-2020-1.0.2.zip

Unzip the file and inside of data/cpsc/classification-of-12-lead-ecgs-the-physionetcomputing-in-cardiology-challenge-2020-1.0.2/training move the cpsc_2018 and cpsc_2018_extra folders into the data/cpsc directory. Then delete the classification-of-12-lead-ecgs-the-physionetcomputing-in-cardiology-challenge-2020-1.0.2 folder.

Mapping Datasets

Mapping datasets are datasets that are mapped to the base datasets. Create these datasets if you want to transform your custom dataset to a format compatbile for https://github.com/ELM-Research/ELM. We provide several examples below as well as the ability to upload the mapped dataset to Huggingface. These are not required to do as we have already uploaded most datasets on Huggingface. Here are the currently supported datasets.

data	Link
ecg-qa-ptbxl-250-2500	willxxy/ecg-qa-ptbxl-250-2500
ecg-qa-mimic-iv-ecg-250-2500	willxxy/ecg-qa-mimic-iv-ecg-250-2500
pretrain-mimic-250-2500	willxxy/pretrain-mimic-250-2500
ecg-grounding-250-2500	willxxy/ecg-grounding-250-2500
ecg-instruct-pulse-250-2500	willxxy/ecg-instruct-pulse-250-2500
ecg-bench-pulse-250-2500	willxxy/ecg-bench-pulse-250-2500
ecg-instruct-45k-250-2500	willxxy/ecg-instruct-45k-250-2500

ECG-QA dataset curated by ECG-QA, Oh et al.

We exactly follow the instructions in this section of the repository for mapping the PTB-XL and MIMIC IV ECG dataset to the question and answers. cd into ecg-qa and execute the following commands in the terminal to prepare the ECG-QA dataset.
To map the ECG-QA dataset to mimic and ptb, execute the following scripts respectively.

uv run src/datasets/map/ecg_qa/mapping_ptbxl_samples.py src/datasets/map/ecg_qa/ecgqa/ptbxl/ --ptbxl-data-dir ../data/ptb_xl

uv run src/datasets/map/ecg_qa/mapping_mimic_iv_ecg_samples.py src/datasets/map/ecg_qa/ecgqa/mimic-iv-ecg --mimic-iv-ecg-data-dir ../data/mimic

After mapping the datasets, you should have an output folder in the data/ecg-qa folder with the mapped paraphrased and template question and answers.

Pretrain MIMIC dataset curated by ECG-Chat, Zhao et al.

Download the pretrain_mimic.json file from this dropbox link and place it in the corresponding folder src/datasets/map/pretrain_mimic/.

Instruct 45k MIMIC dataset curated by ECG-Chat, Zhao et al.

Download the ecg_instruct_45k.json file from this link and place it in the corresponding folder src/datasets/map/ecg_intruct_45k/.

ECG Instruct Pulse dataset curated by PULSE, Liu et al.

Downlod the ECGInstruct.jsonfrom this link. Rename it to ecg_instruct_pulse.json and place it in the corresponding folder src/datasets/map/ecg_instruct_pulse.

ECG Bench Pulse dataset curated by PULSE, Liu et al.

The ECG Bench Pulse dataset is exclusively on HuggingFace with .parquet files, therefore, we utilize the datasets library directly to download the dataset.

ECG Grounding Datasets curated by GEM, Lan et al.

Download the ECG_Grounding_30k.json, ecg-grounding-test.json and grounding_train_30k.json from this link and place it in the corresponding folder src/datasets/map/ecg_grounding. A quick note is that grounding_train_30k.json is a subset of ECG_Grounding_30k.json, where ECG_Grounding_30k.json contains all 30k ECG grounding samples found in grounding_train_30k.json, with additional ECG conversational data from the ECG Instruct PULSE dataset.

ECG Byte Training

We also implement training the BPE algorithm from ECG-Byte. This should be trained only after preprocessing the MIMIC-IV base dataset. Please execute bash scripts/train_ecg_byte.sh.

Hugging Face upload

We have also released the code for uploading the preprocessed, mapped datasets onto HuggingFace datasets. Please view scripts/upload_hf.sh for the script!

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.vscode		.vscode
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECG Preprocessing

Installation

Base Datasets

PTB-XL

MIMIC

Code-15

CSN

CPSC

Mapping Datasets

ECG-QA dataset curated by ECG-QA, Oh et al.

Pretrain MIMIC dataset curated by ECG-Chat, Zhao et al.

Instruct 45k MIMIC dataset curated by ECG-Chat, Zhao et al.

ECG Instruct Pulse dataset curated by PULSE, Liu et al.

ECG Bench Pulse dataset curated by PULSE, Liu et al.

ECG Grounding Datasets curated by GEM, Lan et al.

ECG Byte Training

Hugging Face upload

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ECG Preprocessing

Installation

Base Datasets

PTB-XL

MIMIC

Code-15

CSN

CPSC

Mapping Datasets

ECG-QA dataset curated by ECG-QA, Oh et al.

Pretrain MIMIC dataset curated by ECG-Chat, Zhao et al.

Instruct 45k MIMIC dataset curated by ECG-Chat, Zhao et al.

ECG Instruct Pulse dataset curated by PULSE, Liu et al.

ECG Bench Pulse dataset curated by PULSE, Liu et al.

ECG Grounding Datasets curated by GEM, Lan et al.

ECG Byte Training

Hugging Face upload

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages