This repository contains the codes for our paper titled "DNN Controlled Adaptive Front-end for Replay Attack Detection Systems" (https://authors.elsevier.com/sd/article/S0167-6393(23)00107-3).
This project presents a system capable of detecting replay attacks on automatic speaker verification systems by dynamically recognizing and capturing environmental and device artifacts introduced during replay via filterbanks.
An example of replay attack detection using the Physical Attack partition of ASVspoof 2019 dataset (https://www.asvspoof.org/index2019.html) is provided.
This code uses parts of codes from SincNet repository (https://github.com/mravanelli/SincNet) for utility functions such as parsing configuration files. Backend classification network is based on the models proposed in https://github.com/nesl/asvspoof2019.
- Python 3.6
- PyTorch 1.8.0
- torch-dct (https://github.com/zh217/torch-dct)
- pysoundfile (https://pysoundfile.readthedocs.io/en/latest/#)
Speech utterances should first be decomposed into frames of length 11ms without overlapping. Each framed utterance must be stored in a separate .npy file so that the Dataloaders can access them. A set of sample framed speech files have been provided in data folder for training (TrainSel.tar.gz) and develpment (DevSel.tar.gz) partitions.
For the code to run, [data] section of the configuration file (cfg/config_file.cfg) should be modified according to the user paths. In the provided code, tr_lst and te_lst store the individual file paths of training and development set samples, respectively. labTr_dict and labTe_dict dictionaries assign the corresponding label to each sample.
Once the paths are set, run the following code to decompose signals into frames:
python frame_signals.py
This is a one-time step for a given frame length.
To run the example replay detection experiment, execute the following command:
python main.py