The aim of this work is to study robustness and develop a robust speaker recognition against domain shifts, adversarial attacks, and audio spoofing.
The framework is based on this template, which is based on PyTorch Lightning and Hydra.
# clone template
git clone https://github.com/ahmad-aloradi/adversarial-robustness-for-sr.git
cd adversarial-robustness-for-sr
# install requirements
pip install -r requirements.txt
PyTorch Lightning - a lightweight deep learning framework / PyTorch wrapper for professional AI researchers and machine learning engineers who need maximal flexibility without sacrificing performance at scale.
Hydra - a framework that simplifies configuring complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.
src/
data/
logs/
tests/
- some additional directories, like:
notebooks/
,docs/
, etc.
In this particular case, the directory structure looks like:
├── configs <- Hydra configuration files
│ ├── callbacks <- Callbacks configs
│ ├── datamodule <- Datamodule configs
│ ├── debug <- Debugging configs
│ ├── experiment <- Experiment configs
│ ├── extras <- Extra utilities configs
│ ├── hparams_search <- Hyperparameter search configs
│ ├── hydra <- Hydra settings configs
│ ├── local <- Local configs
│ ├── logger <- Logger configs
│ ├── module <- Module configs
│ ├── paths <- Project paths configs
│ ├── trainer <- Trainer configs
│ │
│ ├── eval.yaml <- Main config for evaluation
│ └── train.yaml <- Main config for training
│
├── data <- Project data
├── logs <- Logs generated by hydra, lightning loggers, etc.
├── notebooks <- Jupyter notebooks.
├── scripts <- Shell scripts
│
├── src <- Source code
│ ├── callbacks <- Additional callbacks
│ ├── datamodules <- Lightning datamodules
│ ├── modules <- Lightning modules
│ ├── utils <- Utility scripts
│ │
│ ├── eval.py <- Run evaluation
│ └── train.py <- Run training
│
├── tests <- Tests of any kind
│
├── .dockerignore <- List of files ignored by docker
├── .gitattributes <- List of additional attributes to pathnames
├── .gitignore <- List of files ignored by git
├── .pre-commit-config.yaml <- Configuration of pre-commit hooks for code formatting
├── Dockerfile <- Dockerfile
├── Makefile <- Makefile with commands like `make train` or `make test`
├── pyproject.toml <- Configuration options for testing and linting
├── requirements.txt <- File for installing python dependencies
├── setup.py <- File for installing project as a package
└── README.md
Our pipeline collect data as .csv
files with a certain columns, which are defined in src/datamodules/components/common.py
as:
@dataclass(frozen=True)
class BaseDatasetCols:
DATASET: Literal['dataset_name'] = 'dataset_name'
LANGUAGE: Literal['language'] = 'language'
NATIONALITY: Literal['country'] = 'country'
SR: Literal['sample_rate'] = 'sample_rate'
SPEAKER_ID: Literal['speaker_id'] = 'speaker_id'
CLASS_ID: Literal['class_id'] = 'class_id'
SPEAKER_NAME: Literal['speaker_name'] = 'speaker_name'
GENDER: Literal['gender'] = 'gender'
SPLIT: Literal['split'] = 'split'
REC_DURATION: Literal['recording_duration'] = 'recording_duration'
REL_FILEPATH: Literal['rel_filepath'] = 'rel_filepath'
TEXT: Literal['text'] = 'text'
Additional columns can be added by overriding the base columns. Non-existing are set to defaults defined in common.py
.
This enforced homogeneity in columns allows composing datasets without complications.
Follow scripts/datasets/prep_{DATASET}.sh
. If you face any problems with these scripts, please report to ahmad.aloradi94@gmail.com.
VoicePrivacy2025
dataset: when untarring theT25-1
model's data, there is a mis-named . PLEASE FIX typo MANUALLY.LibriSpeech
dataset: InSPEAKERS.TXT
, line 60 used to create a problem when loading as.csv
withsep='|'
. It is now automatically handleded.
At the moment we support recipes for the following datasets: VoxCeleb
, LibriSpeech
, VoicePrivacy2025
. Currecntly, we expect the dataset to be downloaded on your machine, but we are slowly trying to intgrate the download in the scripts/datasets
.