Skip to content

Commit

Permalink
[doc] update README.md and add QMF results (#322)
Browse files Browse the repository at this point in the history
* [doc] update README.md and add QMF results

* [doc] update ROADMAP.md

* [doc] update README.md

* [doc] update XVEC results in voxceleb/v2/README.md
  • Loading branch information
JiJiJiang authored May 23, 2024
1 parent 9ca3190 commit 788e3eb
Show file tree
Hide file tree
Showing 5 changed files with 83 additions and 91 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,9 @@ pre-commit install # for clean and tidy code
```

## 🔥 News
* 2024.05.15: Add support for [score calibration](https://arxiv.org/pdf/2211.00815), see [#320](https://github.com/wenet-e2e/wespeaker/pull/320)
* 2024.04.25: Add support for the gemini-dfresnet model, see [#291](https://github.com/wenet-e2e/wespeaker/pull/291)
* 2024.04.23: Support MNN inference engine in runtime, see [#310](https://github.com/wenet-e2e/wespeaker/pull/310)
* 2024.05.15: Add support for [quality-aware score calibration](https://arxiv.org/pdf/2211.00815), see [#320](https://github.com/wenet-e2e/wespeaker/pull/320).
* 2024.04.25: Add support for the gemini-dfresnet model, see [#291](https://github.com/wenet-e2e/wespeaker/pull/291).
* 2024.04.23: Support MNN inference engine in runtime, see [#310](https://github.com/wenet-e2e/wespeaker/pull/310).
* 2024.04.02: Release [Wespeaker document](http://wenet.org.cn/wespeaker) with detailed model-training tutorials, introduction of various runtime platforms, etc.
* 2024.03.04: Support the [eres2net-cn-common-200k](https://www.modelscope.cn/models/iic/speech_eres2net_sv_zh-cn_16k-common/summary) and [campplus-cn-common-200k](https://www.modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) of damo [#281](https://github.com/wenet-e2e/wespeaker/pull/281), check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.
* 2024.02.05: Support the ERes2Net [#272](https://github.com/wenet-e2e/wespeaker/pull/272) and Res2Net [#273](https://github.com/wenet-e2e/wespeaker/pull/273) models.
Expand Down
12 changes: 8 additions & 4 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ This is the roadmap for wespeaker version 2.0.
- [ ] Documents
- [ ] Speaker embedding learning basics
- [ ] Core code explanation
- [ ] Step-by-step tutorials
- [ ] VoxCeleb Supervised
- [ ] VoxCeleb Self-supervised
- [ ] VoxSRC Diarization
- [x] Step-by-step tutorials
- [x] VoxCeleb Supervised
- [x] VoxCeleb Self-supervised
- [x] VoxSRC Diarization

## Version 1.0 (Time: 2022.09)

Expand Down Expand Up @@ -67,6 +67,7 @@ This is the roadmap for wespeaker version 1.0.
- [x] [RepVGG](https://arxiv.org/pdf/2101.03697.pdf)
- [x] [CAM++](https://arxiv.org/pdf/2303.00332.pdf)
- [x] [ERes2Net](https://arxiv.org/pdf/2305.12838.pdf)
- [x] [Gemini-dfresnet](https://arxiv.org/abs/2312.03620)
* Pooling Functions
- [x] TAP(mean) / TSDP(std) / TSTP(mean+std)
- Comparison of mean/std pooling can be found in [shuai_iscslp](https://x-lance.sjtu.edu.cn/en/papers/2021/iscslp21_shuai_1_.pdf), [anna_arxiv](https://arxiv.org/pdf/2203.10300.pdf)
Expand All @@ -85,9 +86,11 @@ This is the roadmap for wespeaker version 1.0.
- [x] Cosine
- [x] PLDA
- [x] Score Normalization (AS-Norm)
- [x] Quality-aware Score Calibration
* Metric
- [x] EER
- [x] minDCF
- [x] DER
* Online Augmentation
- [x] Noise && RIR
- [x] Speed Perturb
Expand All @@ -100,6 +103,7 @@ This is the roadmap for wespeaker version 1.0.
- [x] Python Binding
- [x] Triton Inference Server on verification && diarization in GPU deployment
- [x] C++ Onnxruntime
- [x] MNN
* Self-Supervised Learning (SSL)
- [x] [DINO](https://openaccess.thecvf.com/content/ICCV2021/papers/Caron_Emerging_Properties_in_Self-Supervised_Vision_Transformers_ICCV_2021_paper.pdf)
- [x] [MoCo](https://openaccess.thecvf.com/content_CVPR_2020/papers/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.pdf)
Expand Down
55 changes: 25 additions & 30 deletions examples/cnceleb/v2/README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,35 @@
## Results

* Setup: fbank80, num_frms200, epoch150, ArcMargin, aug_prob0.6, speed_perturb (no spec_aug)
* test_trials: CNC-Eval-Avg.lst

* 🔥 UPDATE 2024.05.16: We update to support score calibration for cnceleb. It will improve the EER but degrade minDCF comparing with asnorm results.

| Model | Params | FLOPs | LM | AS-Norm | Score Calibration | EER (%) | minDCF (p=0.01) |
| :------------------------------ | :-------: | :-----: | :-: | :-------: | :---------------: | :-------: | :--------------: |
| ResNet34-TSTP-emb256 | 6.63M | 4.55 G | × | × | × | 7.124 | 0.408 |
| | | | × || × | 6.742 | 0.367 |
| | | | × ||| 6.336 | 0.374 |
* Scoring: cosine (sub mean of vox2_dev), AS-Norm, [QMF](https://arxiv.org/pdf/2010.11255)
* Test_trial: CNC-Eval-Avg.lst

* 🔥 UPDATE 2022.07.12: We update this recipe according to the setups in the winning system of CNSRC 2022, and get obvious performance improvement compared with the old recipe. Check the [commit1](https://github.com/wenet-e2e/wespeaker/pull/63/commits/b08804987b3bbb26f4963cedf634058474c743dd), [commit2](https://github.com/wenet-e2e/wespeaker/pull/66/commits/6f6af29197f0aa0a5d1b1993b7feb2f41b97891f) for details.
* LR scheduler warmup from 0
* Remove one embedding layer
* Add large margin fine-tuning strategy (LM)

| Model | Params | FLOPs | LM | AS-Norm | EER (%) | minDCF (p=0.01) |
| :------------------------------ | :-------: | :-----: | :-: | :-------: | :-------: | :--------------: |
| ResNet34-TSTP-emb256 (OLD) | 6.70M | 4.55 G | × | × | 8.426 | 0.487 |
| ResNet34-TSTP-emb256 | 6.63M | 4.55 G | × | × | 7.134 | 0.408 |
| | | | × || 6.747 | 0.367 |
| | | || × | 6.652 | 0.393 |
| | | ||| 6.492 | 0.354 |
| ResNet221-TSTP-emb256 | 23.86M | 21.29 G | × | × | 5.965 | 0.362 |
| | | | × || 5.708 | **0.326** |
| | | || × | 5.886 | 0.362 |
| | | ||| **5.655** | 0.330 |
| ECAPA_TDNN_GLOB_c512-ASTP-emb192 | 6.19M | 1.04 G | × | × | 8.313 | 0.432 |
| | | | × || 7.644 | 0.390 |
| | | || × | 8.004 | 0.422 |
| | | ||| 7.417 | 0.379 |
| ECAPA_TDNN_GLOB_c1024-ASTP-emb192 | 14.65M | 2.65 G | × | × | 7.879 | 0.420 |
| | | | × || 7.412 | 0.379 |
| | | || × | 7.986 | 0.417 |
| | | ||| 7.395 | 0.372 |
| RepVGG_TINY_A0 | 6.26M | 4.65 G | × | × | 6.883 | 0.399 |
| | | | × || 6.550 | 0.355 |
| Model | Params | FLOPs | LM | AS-Norm | QMF | EER (%) | minDCF (p=0.01) |
| :------------------------------ | :-------: | :-----: | :-: | :-------: | :-: | :-------: | :--------------: |
| ResNet34-TSTP-emb256 (OLD) | 6.70M | 4.55 G | × | × | × | 8.426 | 0.487 |
| ResNet34-TSTP-emb256 | 6.63M | 4.55 G | × | × | × | 7.134 | 0.408 |
| | | | × || × | 6.747 | 0.367 |
| | | | × ||| 6.336 | 0.374 |
| | | || × | × | 6.652 | 0.393 |
| | | ||| × | 6.492 | 0.354 |
| | | |||| 6.119 | 0.361 |
| ResNet221-TSTP-emb256 | 23.86M | 21.29 G | × | × | × | 5.965 | 0.362 |
| | | | × || × | 5.708 | **0.326** |
| | | || × | × | 5.886 | 0.362 |
| | | ||| × | **5.655** | 0.330 |
| ECAPA_TDNN_GLOB_c512-ASTP-emb192 | 6.19M | 1.04 G | × | × | × | 8.313 | 0.432 |
| | | | × || × | 7.644 | 0.390 |
| | | || × | × | 8.004 | 0.422 |
| | | ||| × | 7.417 | 0.379 |
| ECAPA_TDNN_GLOB_c1024-ASTP-emb192 | 14.65M | 2.65 G | × | × | × | 7.879 | 0.420 |
| | | | × || × | 7.412 | 0.379 |
| | | || × | × | 7.986 | 0.417 |
| | | ||| × | 7.395 | 0.372 |
| RepVGG_TINY_A0 | 6.26M | 4.65 G | × | × | × | 6.883 | 0.399 |
| | | | × || × | 6.550 | 0.355 |

Empty file modified examples/cnceleb/v2/local/score_calibration.sh
100644 → 100755
Empty file.
101 changes: 47 additions & 54 deletions examples/voxceleb/v2/README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,60 @@
## Results

* Setup: fbank80, num_frms200, epoch150, ArcMargin, aug_prob0.6, speed_perturb (no spec_aug)
* Scoring: cosine (sub mean of vox2_dev)
* Scoring: cosine (sub mean of vox2_dev), AS-Norm, [QMF](https://arxiv.org/pdf/2010.11255)
* Metric: EER(%)

* 🔥 UPDATE 2024.05.14: We support score calibration strategy (see [QMF](https://arxiv.org/pdf/2010.11255.pdf)), and obtain better performance.

| Model | Params | Flops | LM | AS-Norm | QMF | vox1-O-clean | vox1-E-clean | vox1-H-clean |
|:------|:------:|:------|:--:|:-------:|:---:|:------------:|:------------:|:------------:|
| ResNet34-TSTP-emb256 | 6.63M | 4.55G | × | × | × | 0.862 | 1.053 | 1.966 |
| | | | × || × | 0.792 | 0.970 | 1.728 |
| | | | × ||| 0.718 | 0.911 | 1.606 |
| | | || × | × | 0.797 | 0.943 | 1.702 |
| | | ||| × | 0.723 | 0.874 | 1.537 |
| | | |||| 0.659 | 0.821 | 1.437 |

* 🔥 UPDATE 2022.07.19: We apply the same setups as the winning system of CNSRC 2022 (see [cnceleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/cnceleb/v2) recipe for details), and obtain significant performance improvement.
* LR scheduler warmup from 0
* Remove one embedding layer in ResNet models
* Add large margin fine-tuning strategy (LM)

| Model | Params | Flops | LM | AS-Norm | vox1-O-clean | vox1-E-clean | vox1-H-clean |
|:------|:------:|:------|:--:|:-------:|:------------:|:------------:|:------------:|
| XVEC-TSTP-emb512 | 4.61M | 0.53G | × | × | 1.989 | 1.209 | 3.412 |
| | | | × || 1.834 | 1.846 | 3.124 |
| | | || × | 1.749 | 1.721 | 2.944 |
| | | ||| 1.590 | 1.641 | 2.726 |
| ECAPA_TDNN_GLOB_c512-ASTP-emb192 | 6.19M | 1.04G | × | × | 1.069 | 1.209 | 2.310 |
| | | | × || 0.957 | 1.128 | 2.105 |
| | | || × | 0.878 | 1.072 | 2.007 |
| | | ||| 0.782 | 1.005 | 1.824 |
| ECAPA_TDNN_GLOB_c1024-ASTP-emb192 | 14.65M | 2.65G | × | × | 0.856 | 1.072 | 2.059 |
| | | | × || 0.808 | 0.990 | 1.874 |
| | | || × | 0.798 | 0.993 | 1.883 |
| | | ||| 0.728 | 0.929 | 1.721 |
| ResNet34-TSTP-emb256 | 6.63M | 4.55G | × | × | 0.867 | 1.049 | 1.959 |
| | | | × || 0.787 | 0.964 | 1.726 |
| | | || × | 0.797 | 0.937 | 1.695 |
| | | ||| 0.723 | 0.867 | 1.532 |
| ResNet221-TSTP-emb256 | 23.79M | 21.29G | × | × | 0.569 | 0.774 | 1.464 |
| | | | × || 0.479 | 0.707 | 1.290 |
| | | || × | 0.580 | 0.729 | 1.351 |
| | | ||| 0.505 | 0.676 | 1.213 |
| ResNet293-TSTP-emb256 | 28.62M | 28.10G | × | × | 0.595 | 0.756 | 1.433 |
| | | | × || 0.537 | 0.701 | 1.276 |
| | | || × | 0.532 | 0.707 | 1.311 |
| | | ||| **0.447** | **0.657** | **1.183** |
| RepVGG_TINY_A0 | 6.26M | 4.65G | × | × | 0.909 | 1.034 | 1.943 |
| | | | × || 0.824 | 0.953 | 1.709 |
| CAM++ | 7.18M | 1.15G | × | × | 0.803 | 0.932 | 1.860 |
| | | | × || 0.718 | 0.879 | 1.735 |
| | | || x | 0.707 | 0.845 | 1.664 |
| | | ||| 0.659 | 0.803 | 1.569 |
| ERes2Net34_Base | 7.88M | 3.43G | × | × | 0.914 | 1.065 | 1.986 |
| | | | × || 0.803 | 0.976 | 1.787 |
| | | || x | 0.824 | 0.968 | 1.776 |
| | | ||| 0.744 | 0.896 | 1.603 |
| Res2Net34_Base | 4.68M | 1.77G | × | × | 1.351 | 1.347 | 2.478 |
| | | | × || 1.234 | 1.232 | 2.162 |
| Gemini_DFResNet114 | 6.53M | 5.42G | × | × | 0.787 | 0.963 | 1.760 |
| | | | × || 0.707 | 0.889 | 1.546 |
| | | || x | 0.771 | 0.906 | 1.599 |
| | | ||| 0.638 | 0.839 | 1.427 |
| Model | Params | Flops | LM | AS-Norm | QMF | vox1-O-clean | vox1-E-clean | vox1-H-clean |
|:------|:------:|:------|:--:|:-------:|:---:|:------------:|:------------:|:------------:|
| XVEC-TSTP-emb512 | 4.61M | 0.53G | × | × | × | 1.989 | 1.950 | 3.412 |
| | | | × || × | 1.834 | 1.846 | 3.124 |
| | | || × | × | 1.749 | 1.721 | 2.944 |
| | | ||| × | 1.590 | 1.641 | 2.726 |
| ECAPA_TDNN_GLOB_c512-ASTP-emb192 | 6.19M | 1.04G | × | × | × | 1.069 | 1.209 | 2.310 |
| | | | × || × | 0.957 | 1.128 | 2.105 |
| | | || × | × | 0.878 | 1.072 | 2.007 |
| | | ||| × | 0.782 | 1.005 | 1.824 |
| ECAPA_TDNN_GLOB_c1024-ASTP-emb192 | 14.65M | 2.65G | × | × | × | 0.856 | 1.072 | 2.059 |
| | | | × || × | 0.808 | 0.990 | 1.874 |
| | | || × | × | 0.798 | 0.993 | 1.883 |
| | | ||| × | 0.728 | 0.929 | 1.721 |
| | | |||| 0.707 | 0.894 | 1.615 |
| ResNet34-TSTP-emb256 | 6.63M | 4.55G | × | × | × | 0.867 | 1.049 | 1.959 |
| | | | × || × | 0.787 | 0.964 | 1.726 |
| | | | × ||| 0.718 | 0.911 | 1.606 |
| | | || × | × | 0.797 | 0.937 | 1.695 |
| | | ||| × | 0.723 | 0.867 | 1.532 |
| | | |||| 0.659 | 0.821 | 1.437 |
| ResNet221-TSTP-emb256 | 23.79M | 21.29G | × | × | × | 0.569 | 0.774 | 1.464 |
| | | | × || × | 0.479 | 0.707 | 1.290 |
| | | || × | × | 0.580 | 0.729 | 1.351 |
| | | ||| × | 0.505 | 0.676 | 1.213 |
| ResNet293-TSTP-emb256 | 28.62M | 28.10G | × | × | × | 0.595 | 0.756 | 1.433 |
| | | | × || × | 0.537 | 0.701 | 1.276 |
| | | || × | × | 0.532 | 0.707 | 1.311 |
| | | ||| × | 0.447 | 0.657 | 1.183 |
| | | |||| **0.425** | **0.641** | **1.146** |
| RepVGG_TINY_A0 | 6.26M | 4.65G | × | × | × | 0.909 | 1.034 | 1.943 |
| | | | × || × | 0.824 | 0.953 | 1.709 |
| CAM++ | 7.18M | 1.15G | × | × | × | 0.803 | 0.932 | 1.860 |
| | | | × || × | 0.718 | 0.879 | 1.735 |
| | | || x | × | 0.707 | 0.845 | 1.664 |
| | | ||| × | 0.659 | 0.803 | 1.569 |
| ERes2Net34_Base | 7.88M | 3.43G | × | × | × | 0.914 | 1.065 | 1.986 |
| | | | × || × | 0.803 | 0.976 | 1.787 |
| | | || x | × | 0.824 | 0.968 | 1.776 |
| | | ||| × | 0.744 | 0.896 | 1.603 |
| Res2Net34_Base | 4.68M | 1.77G | × | × | × | 1.351 | 1.347 | 2.478 |
| | | | × || × | 1.234 | 1.232 | 2.162 |
| Gemini_DFResNet114 | 6.53M | 5.42G | × | × | × | 0.787 | 0.963 | 1.760 |
| | | | × || × | 0.707 | 0.889 | 1.546 |
| | | || x | × | 0.771 | 0.906 | 1.599 |
| | | ||| × | 0.638 | 0.839 | 1.427 |


## PLDA results
Expand Down

0 comments on commit 788e3eb

Please sign in to comment.