Skip to content

Commit

Permalink
Several changes and improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
greenw0lf committed Jul 5, 2024
1 parent e7627e5 commit eddf884
Show file tree
Hide file tree
Showing 7 changed files with 148 additions and 139 deletions.
2 changes: 1 addition & 1 deletion RU/hardware.md → RU/environment.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[Back to homepage](../index.md)

# Hardware setup
# Environment setup


**Kaldi_NL**: [Official github repository](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL). The model used for Kaldi_NL is `radboud_GN` and its corresponding script is `decode_GN.sh`.
Expand Down
6 changes: 3 additions & 3 deletions UT/CommonVoice/cv.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ Here is a matrix with **WER** results and the **time** each model/configuration
|---|---|---|
|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|20.7%|8h:15m:54s*|
|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|5.6%|1h:58m:37s|
|**faster-whisper v3**|**4.3%**|1h:55m:20s|
|faster-whisper v2 w/ VAD|5.6%|1h:58m:50s|
|faster-whisper v3 w/ VAD|4.4%|2h:01m:33s|
|[**faster-whisper v3**](https://github.com/SYSTRAN/faster-whisper/)|**4.3%**|1h:55m:20s|
|[faster-whisper v2 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|5.6%|1h:58m:50s|
|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|4.4%|2h:01m:33s|
|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|6.5%|1h:04m:00s|
|[**MMS - 102 languages**](https://huggingface.co/facebook/mms-1b-fl102)|13.4%|**0h:37m:50s**|
|[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|9.5%|0:53m:56s|
Expand Down
203 changes: 106 additions & 97 deletions UT/Jasmin/jasmin_res.md

Large diffs are not rendered by default.

67 changes: 34 additions & 33 deletions UT/N-Best/nbest_res.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,52 +14,53 @@ For more details about the corpus, click [here](https://citeseerx.ist.psu.edu/do

Here is a matrix with **WER** results of the baseline model, Kaldi_NL, as well as different end-to-end models tested on this corpus:

|Model\Dataset|bn_nl|cts_nl|bn_vl|cts_vl|
|Model\Dataset|Broadcast News in NL|Conversational Speech in NL|Broadcast News in BE|Conversational Speech in BE|
|---|---|---|---|---|
|Kaldi_NL|12.6%|38.6%|21.2%|59.4%|
|Whisper v2|12.7%|25.9%|-|-|
|Whisper v3|13.7%|28.1%|-|-|
|Whisper v2 w/ VAD|11.6%|25.3%|-|-|
|Whisper v3 w/ VAD|14.1%|26.5%|-|-|
|faster-whisper v2|10.6%|24.1%|**13.0%**|38.5%|
|faster-whisper v3|12.5%|25.5%|14.9%|38.4%|
|**faster-whisper v2 w/ VAD**|**10.0%**|**23.9%**|13.6%|37.9%|
|faster-whisper v3 w/ VAD|12.3%|25.1%|14.6%|**36.9%**|
|XLS-R FT on Dutch|14.8%|33.5%|17.0%|51.7%|
|MMS - 102 languages|23.4%|49.0%|25.0%|64.2%|
|MMS - 1162 languages|18.5%|42.7%|19.4%|57.7%|
|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|12.6%|38.6%|21.2%|59.4%|
|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|12.7%|25.9%|-|-|
|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|13.7%|28.1%|-|-|
|[Whisper v2 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|11.6%|25.3%|-|-|
|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|14.1%|26.5%|-|-|
|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|10.6%|24.1%|**13.0%**|38.5%|
|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|12.5%|25.5%|14.9%|38.4%|
|[**faster-whisper v2 w/ VAD**](https://github.com/SYSTRAN/faster-whisper/)|**10.0%**|**23.9%**|13.6%|37.9%|
|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|12.3%|25.1%|14.6%|**36.9%**|
|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|14.8%|33.5%|17.0%|51.7%|
|[MMS - 102 languages](https://huggingface.co/facebook/mms-1b-fl102)|23.4%|49.0%|25.0%|64.2%|
|[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|18.5%|42.7%|19.4%|57.7%|

<br>
And here are results for the same models on `bn_nl` with the foreign speech lines removed from the dataset:

|Model\Dataset|bn_nl|
And here are results for the same models on `Broadcast News in the Netherlands` with the foreign speech lines removed from the dataset:

|Model\Dataset|WER|
|---|---|
|Kaldi_NL|12.1%|
|Whisper v2|12.3%|
|Whisper v3|13.6%|
|**Whisper v2 w/ VAD**|**11.1%**|
|Whisper v3 w/ VAD|13.9%|
|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|12.1%|
|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|12.3%|
|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|13.6%|
|[**Whisper v2 w/ VAD**](https://github.com/linto-ai/whisper-timestamped)|**11.1%**|
|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|13.9%|

Something to note about this setup of the dataset is that Whisper v3 was already able to recognize foreign speech and ignore it. The only notable exception was for a German speaker, where I assume that it was still transcribed because German and Dutch are very similar as languages. So, Whisper v3's hypothesis files were mostly unedited.

<br>

Here is also a matrix with the **time** spent in total by each model **to evaluate** the respective subset:

|Model\Dataset|bn_nl|cts_nl|bn_vl|cts_vl|
|Model\Dataset|Broadcast News in NL|Conversational Speech in NL|Broadcast News in BE|Conversational Speech in BE|
|---|---|---|---|---|
|Kaldi_NL|0h:08m:58s|0h:14m:47s|0h:15m:57s|0h:20m:07s|
|Whisper v2|1h:11m:59s|0h:53m:55s|-|-|
|Whisper v3|1h:09m:00s|0h:40m:20s|-|-|
|Whisper v2 w/ VAD|0h:52m:03s|0h:40m:09s|-|-|
|Whisper v3 w/ VAD|1h:02m:13s|0h:37m:50s|-|-|
|faster-whisper v2|0h:11m:31s|0h:09m:30s|0h:10m:55s|0h:09m:39s|
|faster-whisper v3|0h:11m:21s|0h:09m:41s|0h:10m:36s|0h:09m:54s|
|faster-whisper v2 w/ VAD|0h:12m:13s|0h:09m:36s|0h:11m:01s|0h:09m:46s|
|faster-whisper v3 w/ VAD|0h:12m:25s|0h:09m:13s|0h:10m:45s|0h:09m:04s|
|XLS-R FT on Dutch|0h:07m:36s|0h:07m:52s|0h:08m:44s|0h:08m:21s|
|*MMS - 102 languages*|**0h:04m:33s**|0h:04m:23s|0h:04m:38s|**0h:03m:54s**|
|*MMS - 1162 languages*|0h:05m:26s|**0h:04m:14s**|**0h:04m:37s**|0h:03m:55s|
|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|0h:08m:58s|0h:14m:47s|0h:15m:57s|0h:20m:07s|
|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|1h:11m:59s|0h:53m:55s|-|-|
|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|1h:09m:00s|0h:40m:20s|-|-|
|[Whisper v2 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|0h:52m:03s|0h:40m:09s|-|-|
|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|1h:02m:13s|0h:37m:50s|-|-|
|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|0h:11m:31s|0h:09m:30s|0h:10m:55s|0h:09m:39s|
|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|0h:11m:21s|0h:09m:41s|0h:10m:36s|0h:09m:54s|
|[faster-whisper v2 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:12m:13s|0h:09m:36s|0h:11m:01s|0h:09m:46s|
|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:12m:25s|0h:09m:13s|0h:10m:45s|0h:09m:04s|
|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|0h:07m:36s|0h:07m:52s|0h:08m:44s|0h:08m:21s|
|[*MMS - 102 languages*](https://huggingface.co/facebook/mms-1b-fl102)|**0h:04m:33s**|0h:04m:23s|0h:04m:38s|**0h:03m:54s**|
|[*MMS - 1162 languages*](https://huggingface.co/facebook/mms-1b-all)|0h:05m:26s|**0h:04m:14s**|**0h:04m:37s**|0h:03m:55s|

### Preprocessing, setup, and postprocessing
For more details, click [here](./nbest_setup.md).
2 changes: 1 addition & 1 deletion _includes/head-custom.html
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<link rel="shortcut icon" type="image/x-icon" href='/sos.ico'>
<link rel="shortcut icon" type="image/x-icon" href="{{ '/assets/images/SoS.png' | absolute_url }}">
Binary file added assets/images/SoS.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 3 additions & 4 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,16 @@ Welcome to the benchmark page where researchers and developers report performanc
- [Environment setup](./UT/environment.md)
- [Why do the results differ between whisper-timestamped and faster-whisper?](./UT/analysis.md)

The results in **bold** indicate the best performance for the specific subset(s) between all models.
The results in **bold** indicate the best performance for the specific subset(s) between all models. The lower, the better.

These results were achieved during the PDI-SSH **O**ral **H**istory - **S**tories at the **M**useum around **Art** ([OH-SMArt](https://www.uva.nl/en/discipline/conservation-and-restoration/research/research-projects/oh-smart/oh-smart.html)) project.
These results were achieved during the PDI-SSH **O**ral **H**istory - **S**tories at the **M**useum around **Art** ([OH-SMArt](https://www.uva.nl/en/discipline/conservation-and-restoration/research/research-projects/oh-smart/oh-smart.html)) project (2022-2025).

<h2>RU's Kaldi_NL vs. Whisper vs. Wav2vec2.0 evaluation</h2>

*RU = Radboud University*

- [Results on four medical domain datasets](./RU/wer.md)
- [Hardware setup](./RU/hardware.md)

- [Environment setup](./RU/environment.md)

These results were achieved during the PDI-SSH **Ho**mo **Med**icinalis ([HoMed](https://homed.ruhosting.nl/)) project (2021-2024).

Expand Down

0 comments on commit eddf884

Please sign in to comment.