Skip to content

Latest commit

 

History

History
91 lines (63 loc) · 3.29 KB

README.md

File metadata and controls

91 lines (63 loc) · 3.29 KB

Collaborative Watermarking

This repository contains source code for "Collaborative Watermarking for Adversarial Speech Synthesis" presented at ICASSP 2024.

Listen to audio samples at the demo page https://ljuvela.github.io/CollaborativeWatermarkingDemo/

Paper pre-print available at https://arxiv.org/abs/2309.15224

Published version available at https://ieeexplore.ieee.org/document/10448134

Environment setup

Create a conda environment and install dependencies

conda create -n adversarial-watermarking python=3.10
conda activate adversarial-watermarking
conda install -c pytorch -c conda-forge pytorch torchaudio pytest conda-build matplotlib scipy pysoundfile tensorboard

Install submodule dependencies and link them to the conda python evironment. You could use PYTHONPATH instead, but we recommend conda to keep the environment contained

git submodule update --init --recursive
conda develop third_party/hifi-gan
conda develop third_party/asvspoof-2021/LA/Baseline-LFCC-LCNN

Pretrained models

To replicate experiments, download pretrained ASVSpoof models

cd third_party/asvspoof-2021/LA/Baseline-LFCC-LCNN/project
sh 00_download.sh

We do not currently distribute pre-trained HiFi GAN models. In the paper, we trained a wide range of configurations for 100k iterations, which still leaves a quality gap to the official Hifi GAN trained for 2.5M iterations (https://github.com/jik876/hifi-gan). If there is demand for a production-quality watermarked HiFi GAN, we will definitely consider training one.

Noise augmentation

Musan dataset is available as direct download from https://www.openslr.org/17/

Torchaudio has a nice wrapper, but only available in the nightly prototype at the moment https://pytorch.org/audio/master/generated/torchaudio.prototype.datasets.Musan.html

Using the nightly build provides a convenient method for download, but be careful not to break your environment

conda install -c pytorch-nightly -c nvidia torchaudio
dataset = torchaudio.prototype.datasets.Musan(root='/path/on/your/system/MUSAN', subset='noise', download=True)

Room impulse repsonse augmentation

The repository also implements room impulse response augmentation. The paper didn't have room to evaluate this, but you're welcome to experiment

Download the MIT RIR dataset from http://mcdermottlab.mit.edu/Reverb/IR_Survey.html

VSCode config

The following config snippet helps VSCode find the dependencies for development

{
    "terminal.integrated.env.linux" : {"PYTHONPATH": "${workspaceFolder}/third_party/hifi-gan;${workspaceFolder}/tests"},
    "terminal.integrated.env.osx" :   {"PYTHONPATH": "${workspaceFolder}/third_party/hifi-gan;${workspaceFolder}/tests"},

    "python.analysis.extraPaths": [
        "third_party/hifi-gan",
        "third_party/asvspoof-2021/LA/Baseline-LFCC-LCNN"]
}

Citing

Please use the following BibTeX to cite the paper.

@INPROCEEDINGS{juvela2024-collaborative-watermarking,
  author={Juvela, Lauri and Wang, Xin},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Collaborative Watermarking for Adversarial Speech Synthesis}, 
  year={2024},
  volume={},
  number={},
  pages={11231-11235},
  doi={10.1109/ICASSP48485.2024.10448134}}