ADReSSo

This is the repo for the codes to fine-tuning the transformers with the ADReSSo dataset. The dataset has been converted into Hugging Face dataset.

Our goal is to resolve the first, classification task in the 2021 ADReSSo Challenge at 2021 INTERSPEECH.

Setup

Franework Versions

Transformers 4.35.1
Pytorch 2.1.0+cu121
Datasets 2.14.6
Tokenizers 0.14.1

You can get these package with the following command:

pip install transformers datasets evaluate accelerate

If you want to get more information, please refer to requirements.txt and environment.yaml.

Datasets

Our datasets are retrieved from DementiaBank. Due to the limit of the license, they must be kept in private.
If you want to train our model in different datasets, here is the requirements.
You need to use hugging face dataset, and split them into train and test.
Each dataset need to have the following features:

audio: the audio transform by librosa
label: label of data (1 means control group and 0 means Alzheimer’s Dementia)
mmse(option): Mini–mental state examination

Here is an example.

DatasetDict({
    train: Dataset({
        features: ['audio', 'label', 'mmse'],
        num_rows: 237
    })
    test: Dataset({
        features: ['audio', 'label', 'mmse'],
        num_rows: 46
    })
})

Model

Execution Arguments

-m: the model you want to train (must on huggingface)
-d: sample duration (in seconds)
-b: training batch size
-g: Gradient Accumulation Steps
-hp: enable half precision

Argument	Type	Default value
-m	string	facebook/wav2vec2-base
-d	integer	30
-b	integer	8
-g	integer	4
-hp	boolean	False

Hyper-parameters Tuning Guide

$$\text{Equivalent Batch Size}=\#\text{GPUs}\times\text{Batch Size Per GPU}\times\text{Gradient Accumulation Steps}$$

So if the model fails to fit in the GPU, i.e., CUDA out of memory, try to decrease $\text{Batch Size Per GPU}$ while $\text{Gradient Accumulation Steps}$ incresed simultaneously. In this manner, we trade off time for space.

Prediction

You can refer to this repository

Notice

You need to modify shebang of .py file.
You need a hugging face dataset to support training.
You need to create your hugging face repository to save your model.

Results

Classification

Acoustics

Model Variant	Accuracy	F1
distil-whisper-large-v2	0.8451	0.8607
whisper-large-v3	0.8451	0.8406
distil-whisper-medium.en	0.8169	0.7936
whisper-medium	0.7606	0.7792
whisper-medium.en	0.7324	0.7324

Linguistic

Model Variant	Accuracy	F1
roberta-large	0.8310	0.8421
bart-large-mnli	0.8028	0.8000
bert-large	0.7746	0.7500
bart-large	0.7465	0.7500
flan-t5-large	0.7465	0.7273

Regression

Model Variant	RMSE
whisper-medium.en	4.5335
whisper-large-v3	4.5682
distil-whisper-large-v2	4.7742
whisper-medium	4.8297
distil-whisper-medium.en	4.9445

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.gitignore		.gitignore
README.md		README.md
acoustic_ft.py		acoustic_ft.py
acoustic_ft_2023.py		acoustic_ft_2023.py
acoustic_ft_2023_transfer.py		acoustic_ft_2023_transfer.py
acoustic_ft_cv.py		acoustic_ft_cv.py
acoustic_ft_mmse.py		acoustic_ft_mmse.py
acoustic_ft_robust.py		acoustic_ft_robust.py
environment.yaml		environment.yaml
linguistic_ft.py		linguistic_ft.py
requirements.txt		requirements.txt
transcribing.py		transcribing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADReSSo

Setup

Franework Versions

Datasets

Model

Execution Arguments

Hyper-parameters Tuning Guide

Prediction

Notice

Results

Classification

Regression

About

Contributors 4

Languages

NTHU-ML-2023-team19/ADReSSo

Folders and files

Latest commit

History

Repository files navigation

ADReSSo

Setup

Franework Versions

Datasets

Model

Execution Arguments

Hyper-parameters Tuning Guide

Prediction

Notice

Results

Classification

Regression

About

Topics

Resources

Stars

Watchers

Forks

Contributors 4

Languages