CodecFake Source Tracing

The complete codebase is coming soon!

🛠️ Setup

Dataset Download

Download the CodecFake+ dataset (The dataset is coming soon !)

CodecFake+/
├── all_data_16k/          # CoRS + maskgct_vctk set
│   ├── p225_001_audiodec_24k_320d.wav
│   ├── p225_001_bigcodec.wav
│   ├── ....
│   └── s5_400_xocdec_hubert_general_audio.wav
└── SLMdemos_16k/          # CoSG set
    ├── SIMPLESPEECH1/     
    ├── VIOLA/
    ├── ....
    └── MASKGCT/

Pretrained Weights Setup

For Wav2Vec2-AASIST
- Place xlsr2_300m.pt directly into w2v2_aasist_baseline/

For SAST Net

Create directory Pretrain_weight inside SAST_Net/
Download and place the following checkpoints in SAST_Net/Pretrain_weight:

Model	Description	Download
xlsr2_300m.pt	Wav2Vec2 pretrained weight	Download
mae_pretrained_base.pth	AudioMAE pretrained on AudioSet	Download
tuned_weight.pth	Wav2Vec2-AASIST on CodecFake+	Download

Environment Setup

conda env create -f environment.yml
conda activate CodecFakeSourceTracing

🚀 Inference

Notation

Tasks
- BIN: Binary spoof detection task
- VQ: Vector quantization source tracing task
- AUX: Auxiliary training objective source tracing task
- DEC: Decoder type source tracing task
Training Subsets
- vq: VQ taxonomy sampling (MVQ : SVQ : SQ = 1:1:1)
- aux: AUX taxonomy sampling (None : Semantic Distillation : Disentanglement = 1:1:1)
- dec: DEC taxonomy sampling (Time : Freqency = 1:1)

Model Checkpoints

Wav2Vec2-AASIST

Single-Task Learning Models

Model Task Trained Dataset Download Links

S_BIN BIN vq / aux / dec vq • aux • dec

S_VQ VQ vq Download

S_AUX AUX aux Download

S_DEC DEC dec Download

Dual-Task Learning Models

Model Task Trained Dataset Download Links

D_VQ BIN / VQ vq Download

D_AUX BIN / AUX aux Download

D_DEC BIN / DEC dec Download

Multi-Task Learning Models

Model Task Trained Dataset Download Links

M1 BIN / VQ / AUX / DEC vq / aux / dec vq • aux • dec

M2 VQ / AUX / DEC vq / aux / dec vq • aux • dec
SAST Net

Model Task Trained Dataset Download Links

SAST Net BIN vq / aux / dec vq • aux • dec

VQ vq Download

AUX aux Download

DEC dec Download

Running Inference

Wav2Vec2-AASIST
```
cd w2v2_aasist_baseline/
bash inference.sh ${dataset_type} ${base_dir} ${checkpoint_path} ${model_type}
```
Parameters:
- dataset_type: "CoRS" or "CoSG"
- base_dir: Path to dataset directory
  - For CoRS: "CodecFake+/all_data_16k/"
  - For CoSG: "CodecFake+/SLMdemos_16k/"
- checkpoint_path: Path to model checkpoint
- model_type: S_BIN / S_VQ / S_AUX / S_DEC / D_VQ / D_AUX / D_DEC / M1 / M2
SAST Net
```
cd SAST_Net/
bash inference.sh ${base_dir} ${dataset_type} ${checkpoint_path} ${task} ${eval_output}
```
Parameters:
- base_dir: Path to dataset directory
- dataset_type: "CoRS" or "CoSG"
- checkpoint_path: Path to model checkpoint
- task: Bin / AUX / DEC / VQ
- eval_output: Results directory (default: "./Result")

🎯 Training

Wav2Vec2-AASIST
```
cd w2v2_aasist_baseline/
bash train.sh ${base_dir} ${batch_size} ${num_epochs} ${lr} ${model_type} ${sampling_strategy}
```
Parameters:
- base_dir: Path to "CodecFake+/all_data_16k/"
- batch_size: Batch size (default: 8)
- num_epochs: Training epochs (default: 20)
- lr: Learning rate (default: 1e-06)
- model_type: S_BIN / S_VQ / S_AUX / S_DEC / D_VQ / D_AUX / D_DEC / M1 / M2
- sampling_strategy: VQ / AUX / DEC
SAST Net
```
cd SAST_Net
bash train.sh ${base_dir} ${save_dir} ${batch_size} ${num_epochs} ${lr} ${task} ${sampling_strategy} ${mask_ratio}
```
Parameters:
- base_dir: Path to "CodecFake+/all_data_16k/"
- save_dir: Checkpoint save directory (default: ./models_SAST_Net)
- batch_size: Batch size (default: 12)
- num_epochs: Training epochs (default: 40)
- lr: Learning rate (default: 5e-06)
- task: Bin / VQ / AUX / DEC
- sampling_strategy: VQ / AUX / DEC
- mask_ratio: MAE mask ratio (default: 0.4)

📚 Citation

If this work helps your research, please consider citing our papers:

@article{chen2025codec,
  title={Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy},
  author={Chen, Xuanjun and Lin, I-Ming and Zhang, Lin and Du, Jiawei and Wu, Haibin and Lee, Hung-yi and Jang, Jyh-Shing Roger Jang},
  journal={arXiv preprint arXiv:2505.12994},
  year={2025}
}

@article{chen2025towards,
  title={Towards Generalized Source Tracing for Codec-Based Deepfake Speech},
  author={Chen, Xuanjun and Lin, I-Ming and Zhang, Lin and Wu, Haibin and Lee, Hung-yi and Jang, Jyh-Shing Roger Jang},
  journal={arXiv preprint arXiv:2506.07294},
  year={2025}
}

@article{chen2025codecfake+,
  title={CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset},
  author={Chen, Xuanjun and Du, Jiawei and Wu, Haibin and Zhang, Lin and Lin, I and Chiu, I and Ren, Wenze and Tseng, Yuan and Tsao, Yu and Jang, Jyh-Shing Roger and others},
  journal={arXiv preprint arXiv:2501.08238},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
SAST_Net		SAST_Net
w2v2_aasist_baseline		w2v2_aasist_baseline
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodecFake Source Tracing

🛠️ Setup

Dataset Download

Pretrained Weights Setup

For Wav2Vec2-AASIST

For SAST Net

Environment Setup

🚀 Inference

Notation

Tasks

Training Subsets

Model Checkpoints

Wav2Vec2-AASIST

SAST Net

Running Inference

Wav2Vec2-AASIST

SAST Net

🎯 Training

Wav2Vec2-AASIST

SAST Net

📚 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Model	Task	Trained Dataset	Download Links
S_BIN	BIN	vq / aux / dec	vq • aux • dec
S_VQ	VQ	vq	Download
S_AUX	AUX	aux	Download
S_DEC	DEC	dec	Download

Model	Task	Trained Dataset	Download Links
D_VQ	BIN / VQ	vq	Download
D_AUX	BIN / AUX	aux	Download
D_DEC	BIN / DEC	dec	Download

Model	Task	Trained Dataset	Download Links
M1	BIN / VQ / AUX / DEC	vq / aux / dec	vq • aux • dec
M2	VQ / AUX / DEC	vq / aux / dec	vq • aux • dec

Model	Task	Trained Dataset	Download Links
SAST Net	BIN	vq / aux / dec	vq • aux • dec
	VQ	vq	Download
	AUX	aux	Download
	DEC	dec	Download

ResponsibleGenAI/CodecFake-Source-Tracing

Folders and files

Latest commit

History

Repository files navigation

CodecFake Source Tracing

🛠️ Setup

Dataset Download

Pretrained Weights Setup

For Wav2Vec2-AASIST

For SAST Net

Environment Setup

🚀 Inference

Notation

Tasks

Training Subsets

Model Checkpoints

Wav2Vec2-AASIST

SAST Net

Running Inference

Wav2Vec2-AASIST

SAST Net

🎯 Training

Wav2Vec2-AASIST

SAST Net

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages