Patch-Mix Contrastive Learning (INTERSPEECH 2023)

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
Sangmin Bae*, June-Woo Kim*, Won-Yang Cho, Hyerim Baek, Soyoun Son, Byungjo Lee, Changwan Ha, Kyongpil Tae, Sungnyun Kim$^\dagger$, Se-Young Yun$^\dagger$
* equal contribution $^\dagger$ corresponding authors

We demonstrate that the pretrained model on large-scale visual and audio datasets can be generalized to the respiratory sound classification task.
We introduce a straightforward Patch-Mix augmentation, which randomly mixes patches between different samples, with Audio Spectrogram Transformer (AST).
To overcome the label hierarchy in lung sound datasets, we propose an effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space.

Requirements

Install the necessary packages with:

$ pip install torch torchvision torchaudio
$ pip install -r requirements.txt

Data Preparation

Download the ICBHI dataset files from official_page.

$ wget https://bhichallenge.med.auth.gr/sites/default/files/ICBHI_final_database/ICBHI_final_database.zip

All *.wav and *.txt should be saved in data/icbhi_dataset/audio_test_data.

Note that ICBHI dataset consists of a total of 6,898 respiratory cycles, of which 1,864 contain crackles, 886 contain wheezes, and 506 contain both crackles and wheezes, in 920 annotated audio samples from 126 subjects.

Training

To simply train the model, run the shell files in scripts/.

scripts/icbhi_ce.sh: Cross-Entropy loss with AST model.
scripts/icbhi_patchmix_ce.sh: Patch-Mix loss with AST model, where the label depends on the interpolation ratio.
scripts/icbhi_patchmix_cl.sh: Patch-Mix contrastive loss with AST model.

Important arguments for different data settings.

--dataset: other lungsound datasets or heart sound can be implemented
--class_split: "lungsound" or "diagnosis" classification
--n_cls: set number of classes as 4 or 2 (normal / abnormal) for lungsound classification
--test_fold: "official" denotes 60/40% train/test split, and "0"~"4" denote 80/20% split

Important arguments for models.

--model: network architecture, see models
--from_sl_official: load ImageNet pretrained checkpoint
--audioset_pretrained: load AudioSet pretrained checkpoint and only support AST and SSAST

Important arugment for evaluation.

--eval: switch mode to evaluation without any training
--pretrained: load pretrained checkpoint and require pretrained_ckpt argument.
--pretrained_ckpt: path for the pretrained checkpoint

The pretrained model checkpoints will be saved at save/[EXP_NAME]/best.pth.

Result

Patch-Mix Contrastive Learning achieves the state-of-the-art performance of 62.37%, which is higher than previous Score by +4.08%.

BibTeX

If you find this repo useful for your research, please consider citing our paper:

@inproceedings{bae23b_interspeech,
  title     = {Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification},
  author    = {Sangmin Bae and June-Woo Kim and Won-Yang Cho and Hyerim Baek and Soyoun Son and Byungjo Lee and Changwan Ha and Kyongpil Tae and Sungnyun Kim and Se-Young Yun},
  year      = {2023},
  booktitle = {INTERSPEECH 2023},
  pages     = {5436--5440},
  doi       = {10.21437/Interspeech.2023-1426},
  issn      = {2958-1796},
}

Contact

Sangmin Bae: bsmn0223@kaist.ac.kr
June-Woo Kim: kaen2891@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data/icbhi_dataset		data/icbhi_dataset
method		method
models		models
scripts		scripts
util		util
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Patch-Mix Contrastive Learning (INTERSPEECH 2023)

Requirements

Data Preparation

Training

Result

BibTeX

Contact

About

Releases

Packages

Contributors 3

Languages

raymin0223/patch-mix_contrastive_learning

Folders and files

Latest commit

History

Repository files navigation

Patch-Mix Contrastive Learning (INTERSPEECH 2023)

Requirements

Data Preparation

Training

Result

BibTeX

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages