Several changes and improvements

opensource-spraakherkenning-nl · Jul 5, 2024 · eddf884 · eddf884
1 parent e7627e5
commit eddf884
Show file tree

Hide file tree

Showing 7 changed files with 148 additions and 139 deletions.
diff --git a/RU/hardware.md → RU/environment.md b/RU/hardware.md → RU/environment.md
@@ -1,6 +1,6 @@
 [Back to homepage](../index.md)
 
-# Hardware setup
+# Environment setup
 
 
 **Kaldi_NL**: [Official github repository](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL). The model used for Kaldi_NL is `radboud_GN` and its corresponding script is `decode_GN.sh`.

diff --git a/UT/CommonVoice/cv.md b/UT/CommonVoice/cv.md
@@ -14,9 +14,9 @@ Here is a matrix with **WER** results and the **time** each model/configuration
 |---|---|---|
 |[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|20.7%|8h:15m:54s*|
 |[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|5.6%|1h:58m:37s|
-|**faster-whisper v3**|**4.3%**|1h:55m:20s|
-|faster-whisper v2 w/ VAD|5.6%|1h:58m:50s|
-|faster-whisper v3 w/ VAD|4.4%|2h:01m:33s|
+|[**faster-whisper v3**](https://github.com/SYSTRAN/faster-whisper/)|**4.3%**|1h:55m:20s|
+|[faster-whisper v2 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|5.6%|1h:58m:50s|
+|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|4.4%|2h:01m:33s|
 |[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|6.5%|1h:04m:00s|
 |[**MMS - 102 languages**](https://huggingface.co/facebook/mms-1b-fl102)|13.4%|**0h:37m:50s**|
 |[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|9.5%|0:53m:56s|

diff --git a/UT/Jasmin/jasmin_res.md b/UT/Jasmin/jasmin_res.md
diff --git a/UT/N-Best/nbest_res.md b/UT/N-Best/nbest_res.md
@@ -14,52 +14,53 @@ For more details about the corpus, click [here](https://citeseerx.ist.psu.edu/do
 
 Here is a matrix with **WER** results of the baseline model, Kaldi_NL, as well as different end-to-end models tested on this corpus:
 
-|Model\Dataset|bn_nl|cts_nl|bn_vl|cts_vl|
+|Model\Dataset|Broadcast News in NL|Conversational Speech in NL|Broadcast News in BE|Conversational Speech in BE|
 |---|---|---|---|---|
-|Kaldi_NL|12.6%|38.6%|21.2%|59.4%|
-|Whisper v2|12.7%|25.9%|-|-|
-|Whisper v3|13.7%|28.1%|-|-|
-|Whisper v2 w/ VAD|11.6%|25.3%|-|-|
-|Whisper v3 w/ VAD|14.1%|26.5%|-|-|
-|faster-whisper v2|10.6%|24.1%|**13.0%**|38.5%|
-|faster-whisper v3|12.5%|25.5%|14.9%|38.4%|
-|**faster-whisper v2 w/ VAD**|**10.0%**|**23.9%**|13.6%|37.9%|
-|faster-whisper v3 w/ VAD|12.3%|25.1%|14.6%|**36.9%**|
-|XLS-R FT on Dutch|14.8%|33.5%|17.0%|51.7%|
-|MMS - 102 languages|23.4%|49.0%|25.0%|64.2%|
-|MMS - 1162 languages|18.5%|42.7%|19.4%|57.7%|
+|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|12.6%|38.6%|21.2%|59.4%|
+|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|12.7%|25.9%|-|-|
+|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|13.7%|28.1%|-|-|
+|[Whisper v2 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|11.6%|25.3%|-|-|
+|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|14.1%|26.5%|-|-|
+|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|10.6%|24.1%|**13.0%**|38.5%|
+|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|12.5%|25.5%|14.9%|38.4%|
+|[**faster-whisper v2 w/ VAD**](https://github.com/SYSTRAN/faster-whisper/)|**10.0%**|**23.9%**|13.6%|37.9%|
+|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|12.3%|25.1%|14.6%|**36.9%**|
+|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|14.8%|33.5%|17.0%|51.7%|
+|[MMS - 102 languages](https://huggingface.co/facebook/mms-1b-fl102)|23.4%|49.0%|25.0%|64.2%|
+|[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|18.5%|42.7%|19.4%|57.7%|
 
 <br>
-And here are results for the same models on `bn_nl` with the foreign speech lines removed from the dataset:
 
-|Model\Dataset|bn_nl|
+And here are results for the same models on `Broadcast News in the Netherlands` with the foreign speech lines removed from the dataset:
+
+|Model\Dataset|WER|
 |---|---|
-|Kaldi_NL|12.1%|
-|Whisper v2|12.3%|
-|Whisper v3|13.6%|
-|**Whisper v2 w/ VAD**|**11.1%**|
-|Whisper v3 w/ VAD|13.9%|
+|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|12.1%|
+|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|12.3%|
+|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|13.6%|
+|[**Whisper v2 w/ VAD**](https://github.com/linto-ai/whisper-timestamped)|**11.1%**|
+|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|13.9%|
 
 Something to note about this setup of the dataset is that Whisper v3 was already able to recognize foreign speech and ignore it. The only notable exception was for a German speaker, where I assume that it was still transcribed because German and Dutch are very similar as languages. So, Whisper v3's hypothesis files were mostly unedited.
 
 <br>
 
 Here is also a matrix with the **time** spent in total by each model **to evaluate** the respective subset:
 
-|Model\Dataset|bn_nl|cts_nl|bn_vl|cts_vl|
+|Model\Dataset|Broadcast News in NL|Conversational Speech in NL|Broadcast News in BE|Conversational Speech in BE|
 |---|---|---|---|---|
-|Kaldi_NL|0h:08m:58s|0h:14m:47s|0h:15m:57s|0h:20m:07s|
-|Whisper v2|1h:11m:59s|0h:53m:55s|-|-|
-|Whisper v3|1h:09m:00s|0h:40m:20s|-|-|
-|Whisper v2 w/ VAD|0h:52m:03s|0h:40m:09s|-|-|
-|Whisper v3 w/ VAD|1h:02m:13s|0h:37m:50s|-|-|
-|faster-whisper v2|0h:11m:31s|0h:09m:30s|0h:10m:55s|0h:09m:39s|
-|faster-whisper v3|0h:11m:21s|0h:09m:41s|0h:10m:36s|0h:09m:54s|
-|faster-whisper v2 w/ VAD|0h:12m:13s|0h:09m:36s|0h:11m:01s|0h:09m:46s|
-|faster-whisper v3 w/ VAD|0h:12m:25s|0h:09m:13s|0h:10m:45s|0h:09m:04s|
-|XLS-R FT on Dutch|0h:07m:36s|0h:07m:52s|0h:08m:44s|0h:08m:21s|
-|*MMS - 102 languages*|**0h:04m:33s**|0h:04m:23s|0h:04m:38s|**0h:03m:54s**|
-|*MMS - 1162 languages*|0h:05m:26s|**0h:04m:14s**|**0h:04m:37s**|0h:03m:55s|
+|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|0h:08m:58s|0h:14m:47s|0h:15m:57s|0h:20m:07s|
+|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|1h:11m:59s|0h:53m:55s|-|-|
+|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|1h:09m:00s|0h:40m:20s|-|-|
+|[Whisper v2 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|0h:52m:03s|0h:40m:09s|-|-|
+|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|1h:02m:13s|0h:37m:50s|-|-|
+|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|0h:11m:31s|0h:09m:30s|0h:10m:55s|0h:09m:39s|
+|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|0h:11m:21s|0h:09m:41s|0h:10m:36s|0h:09m:54s|
+|[faster-whisper v2 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:12m:13s|0h:09m:36s|0h:11m:01s|0h:09m:46s|
+|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:12m:25s|0h:09m:13s|0h:10m:45s|0h:09m:04s|
+|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|0h:07m:36s|0h:07m:52s|0h:08m:44s|0h:08m:21s|
+|[*MMS - 102 languages*](https://huggingface.co/facebook/mms-1b-fl102)|**0h:04m:33s**|0h:04m:23s|0h:04m:38s|**0h:03m:54s**|
+|[*MMS - 1162 languages*](https://huggingface.co/facebook/mms-1b-all)|0h:05m:26s|**0h:04m:14s**|**0h:04m:37s**|0h:03m:55s|
 
 ### Preprocessing, setup, and postprocessing
 For more details, click [here](./nbest_setup.md).
diff --git a/_includes/head-custom.html b/_includes/head-custom.html
@@ -1 +1 @@
-<link rel="shortcut icon" type="image/x-icon" href='/sos.ico'>
+<link rel="shortcut icon" type="image/x-icon" href="{{ '/assets/images/SoS.png' | absolute_url }}">
diff --git a/assets/images/SoS.png b/assets/images/SoS.png
diff --git a/index.md b/index.md
@@ -12,17 +12,16 @@ Welcome to the benchmark page where researchers and developers report performanc
 - [Environment setup](./UT/environment.md)
 - [Why do the results differ between whisper-timestamped and faster-whisper?](./UT/analysis.md)
 
-The results in **bold** indicate the best performance for the specific subset(s) between all models.
+The results in **bold** indicate the best performance for the specific subset(s) between all models. The lower, the better.
 
-These results were achieved during the PDI-SSH **O**ral **H**istory - **S**tories at the **M**useum around **Art** ([OH-SMArt](https://www.uva.nl/en/discipline/conservation-and-restoration/research/research-projects/oh-smart/oh-smart.html)) project.
+These results were achieved during the PDI-SSH **O**ral **H**istory - **S**tories at the **M**useum around **Art** ([OH-SMArt](https://www.uva.nl/en/discipline/conservation-and-restoration/research/research-projects/oh-smart/oh-smart.html)) project (2022-2025).
 
 <h2>RU's Kaldi_NL vs. Whisper vs. Wav2vec2.0 evaluation</h2>
 
 *RU = Radboud University*
 
 - [Results on four medical domain datasets](./RU/wer.md)
-- [Hardware setup](./RU/hardware.md)
-
+- [Environment setup](./RU/environment.md)
 
 These results were achieved during the PDI-SSH **Ho**mo **Med**icinalis ([HoMed](https://homed.ruhosting.nl/)) project (2021-2024).