This repository contains the official implementation of Hierarchical Spectrogram Transformers (HST) described in the following paper:
Aytekin, I., Dalmaz, O., Gonc, K., Ankishan, H., Saritas, E.U., Bagci, U., Celik, H., & Çukur, T. (2022). COVID-19 Detection from Respiratory Sounds with Hierarchical Spectrogram Transformers. ArXiv, abs/2207.09529.
python>=3.6.9
torch>=1.7.0
torchvision>=0.8.1
librosa
cuda=>11.3
The following links contain pre-trained HST model weights on ImageNet:
After downloading the weights, please align them as HST/model/imagenet_weights/hst_base_imagenet.pth
for a smooth process.
The dataset in the paper Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data is used in this work. Their dataset is not publicly available but can be released for research purposes as said here.
For Task 1,
- covid: covidandroidnocough + covidandroidwithcough + covidwebnocough + covidwebwithcough
- healthy: healthyandroidnosymp + healthywebnosymp
For Task 2,
- covid: covidandroidwithcough + covidwebwithcough
- healthy: healthyandroidwithcough + healthywebwithcough
The audio files in the folders mentioned above are converted to spectrograms by wave2spectrogram.py
. Then, the dataset should be aligned as:
/data/
├── task1_cough
├── task1_breath
├── task2_cough
├── task2_breath
/data/task1_cough/
├── train_test
├── val
/data/task1_cough/train_test
├── covid
├── healthy
To train and test the chosen model with the determined seed, follow:
cd HST
python3 train.py --dataset "/data/task1_cough/train_test" --model "hst_base" --pretrained True --seed 1
In our paper, HST is trained with 10 different seed for 10-fold like cross-validation. The results are averaged and reported in the paper.
An audio file of a respiratory sound can be tested with demo.py
. The HST-Base model trained with task 2 cough modality data with seed 1 can be downloaded from this link.
python3 demo.py --audio_path "sample_resp_sound"
Result is printed as "healthy" or "covid".
You are encouraged to modify/distribute this code. However, please acknowledge this code and cite the paper appropriately.
@misc{hst,
doi = {10.48550/ARXIV.2207.09529},
url = {https://arxiv.org/abs/2207.09529},
author = {Aytekin, Idil and Dalmaz, Onat and Gonc, Kaan and Ankishan, Haydar and Saritas, Emine U and Bagci, Ulas and Celik, Haydar and Cukur, Tolga},
keywords = {Sound (cs.SD), Machine Learning (cs.LG), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering},
title = {COVID-19 Detection from Respiratory Sounds with Hierarchical Spectrogram Transformers},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
This code uses libraries from covid19-sounds-kdd20.
For questions and comments, please contact me: aytekinayceidil@gmail.com