Skip to content

ahmad-aloradi/adversarial-robustness-for-sr

Repository files navigation

Robust Speaker Recognition Against Adversarial Attacks and Spoofing

PyTorch Lightning Config: Hydra Template Publication

Description

The aim of this work is to study robustness and develop a robust speaker recognition against domain shifts, adversarial attacks, and audio spoofing.

The framework is based on this template, which is based on PyTorch Lightning and Hydra.

Quick start

# clone template
git clone https://github.com/ahmad-aloradi/adversarial-robustness-for-sr.git
cd adversarial-robustness-for-sr

# install requirements
pip install -r requirements.txt

Main Packages

PyTorch Lightning - a lightweight deep learning framework / PyTorch wrapper for professional AI researchers and machine learning engineers who need maximal flexibility without sacrificing performance at scale.

Hydra - a framework that simplifies configuring complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.

Project structure

  • src/
  • data/
  • logs/
  • tests/
  • some additional directories, like: notebooks/, docs/, etc.

In this particular case, the directory structure looks like:

├── configs                     <- Hydra configuration files
│   ├── callbacks               <- Callbacks configs
│   ├── datamodule              <- Datamodule configs
│   ├── debug                   <- Debugging configs
│   ├── experiment              <- Experiment configs
│   ├── extras                  <- Extra utilities configs
│   ├── hparams_search          <- Hyperparameter search configs
│   ├── hydra                   <- Hydra settings configs
│   ├── local                   <- Local configs
│   ├── logger                  <- Logger configs
│   ├── module                  <- Module configs
│   ├── paths                   <- Project paths configs
│   ├── trainer                 <- Trainer configs
│   │
│   ├── eval.yaml               <- Main config for evaluation
│   └── train.yaml              <- Main config for training
│
├── data                        <- Project data
├── logs                        <- Logs generated by hydra, lightning loggers, etc.
├── notebooks                   <- Jupyter notebooks.
├── scripts                     <- Shell scripts
│
├── src                         <- Source code
│   ├── callbacks               <- Additional callbacks
│   ├── datamodules             <- Lightning datamodules
│   ├── modules                 <- Lightning modules
│   ├── utils                   <- Utility scripts
│   │
│   ├── eval.py                 <- Run evaluation
│   └── train.py                <- Run training
│
├── tests                       <- Tests of any kind
│
├── .dockerignore               <- List of files ignored by docker
├── .gitattributes              <- List of additional attributes to pathnames
├── .gitignore                  <- List of files ignored by git
├── .pre-commit-config.yaml     <- Configuration of pre-commit hooks for code formatting
├── Dockerfile                  <- Dockerfile
├── Makefile                    <- Makefile with commands like `make train` or `make test`
├── pyproject.toml              <- Configuration options for testing and linting
├── requirements.txt            <- File for installing python dependencies
├── setup.py                    <- File for installing project as a package
└── README.md

Data Preparation

Structure

Our pipeline collect data as .csv files with a certain columns, which are defined in src/datamodules/components/common.py as:

@dataclass(frozen=True)
class BaseDatasetCols:
    DATASET: Literal['dataset_name'] = 'dataset_name'
    LANGUAGE: Literal['language'] = 'language'
    NATIONALITY: Literal['country'] = 'country'
    SR: Literal['sample_rate'] = 'sample_rate'
    SPEAKER_ID: Literal['speaker_id'] = 'speaker_id'
    CLASS_ID: Literal['class_id'] = 'class_id'
    SPEAKER_NAME: Literal['speaker_name'] = 'speaker_name'
    GENDER: Literal['gender'] = 'gender'
    SPLIT: Literal['split'] = 'split'
    REC_DURATION: Literal['recording_duration'] = 'recording_duration'
    REL_FILEPATH: Literal['rel_filepath'] = 'rel_filepath'
    TEXT: Literal['text'] = 'text'

Additional columns can be added by overriding the base columns. Non-existing are set to defaults defined in common.py.

This enforced homogeneity in columns allows composing datasets without complications.

Preprare the csvs

Follow scripts/datasets/prep_{DATASET}.sh. If you face any problems with these scripts, please report to ahmad.aloradi94@gmail.com.

Known Issues:

  1. VoicePrivacy2025 dataset: when untarring the T25-1 model's data, there is a mis-named . PLEASE FIX typo MANUALLY.
  2. LibriSpeech dataset: In SPEAKERS.TXT, line 60 used to create a problem when loading as .csv with sep='|'. It is now automatically handleded.

Recipes

At the moment we support recipes for the following datasets: VoxCeleb, LibriSpeech, VoicePrivacy2025. Currecntly, we expect the dataset to be downloaded on your machine, but we are slowly trying to intgrate the download in the scripts/datasets.

About

This project is subproject of the COMFORT.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published