From eddf884c1e0af86bc93e273bfa267dc601a0fdc5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Drago=C8=99?= Date: Fri, 5 Jul 2024 13:40:00 +0200 Subject: [PATCH] Several changes and improvements --- RU/{hardware.md => environment.md} | 2 +- UT/CommonVoice/cv.md | 6 +- UT/Jasmin/jasmin_res.md | 203 +++++++++++++++-------------- UT/N-Best/nbest_res.md | 67 +++++----- _includes/head-custom.html | 2 +- assets/images/SoS.png | Bin 0 -> 5177 bytes index.md | 7 +- 7 files changed, 148 insertions(+), 139 deletions(-) rename RU/{hardware.md => environment.md} (98%) create mode 100644 assets/images/SoS.png diff --git a/RU/hardware.md b/RU/environment.md similarity index 98% rename from RU/hardware.md rename to RU/environment.md index 66ee1e8..9c15daf 100644 --- a/RU/hardware.md +++ b/RU/environment.md @@ -1,6 +1,6 @@ [Back to homepage](../index.md) -# Hardware setup +# Environment setup **Kaldi_NL**: [Official github repository](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL). The model used for Kaldi_NL is `radboud_GN` and its corresponding script is `decode_GN.sh`. diff --git a/UT/CommonVoice/cv.md b/UT/CommonVoice/cv.md index dc3b148..e64124a 100644 --- a/UT/CommonVoice/cv.md +++ b/UT/CommonVoice/cv.md @@ -14,9 +14,9 @@ Here is a matrix with **WER** results and the **time** each model/configuration |---|---|---| |[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|20.7%|8h:15m:54s*| |[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|5.6%|1h:58m:37s| -|**faster-whisper v3**|**4.3%**|1h:55m:20s| -|faster-whisper v2 w/ VAD|5.6%|1h:58m:50s| -|faster-whisper v3 w/ VAD|4.4%|2h:01m:33s| +|[**faster-whisper v3**](https://github.com/SYSTRAN/faster-whisper/)|**4.3%**|1h:55m:20s| +|[faster-whisper v2 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|5.6%|1h:58m:50s| +|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|4.4%|2h:01m:33s| |[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|6.5%|1h:04m:00s| |[**MMS - 102 languages**](https://huggingface.co/facebook/mms-1b-fl102)|13.4%|**0h:37m:50s**| |[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|9.5%|0:53m:56s| diff --git a/UT/Jasmin/jasmin_res.md b/UT/Jasmin/jasmin_res.md index 65f2052..d249a3e 100644 --- a/UT/Jasmin/jasmin_res.md +++ b/UT/Jasmin/jasmin_res.md @@ -5,18 +5,18 @@ Here is a matrix with **WER** results of the baseline model, Kaldi_NL, as well a |Model\Dataset|Native Children|Native Teenagers|Non-native Minors|Non-native Adults|Native Elderly| |---|---|---|---|---|---| -|Kaldi_NL|28.1%|16.2%|43.6%|45.3%|20.9%| -|Whisper v2|22.6%|18.0%|36.5%|37.3%|22.2%| -|Whisper v3|34.2%|29.4%|50.4%|58.5%|34.4%| -|Whisper v2 w/ VAD|20.1%|12.4%|30.2%|33.4%|14.9%| -|Whisper v3 w/ VAD|34.7%|27.5%|46.7%|53.0%|30.2%| -|faster-whisper v2|20.3%|11.3%|29.9%|30.6%|13.7%| -|faster-whisper v3|28.1%|25.2%|50.9%|62.6%|27.6%| -|**faster-whisper v2 w/ VAD**|**19.1%**|**11.1%**|**29.5%**|**30.0%**|**12.8%**| -|faster-whisper v3 w/ VAD|27.5%|22.4%|42.6%|49.4%|25.2%| -|XLS-R FT on Dutch|22.4%|13.3%|33.8%|36.1%|17.2%| -|MMS - 102 languages|31.6%|20.3%|54.2%|55.1%|23.9%| -|MMS - 1162 languages|28.9%|20.0%|50.1%|54.0%|28.3%| +|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|28.1%|16.2%|43.6%|45.3%|20.9%| +|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|22.6%|18.0%|36.5%|37.3%|22.2%| +|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|34.2%|29.4%|50.4%|58.5%|34.4%| +|[Whisper v2 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|20.1%|12.4%|30.2%|33.4%|14.9%| +|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|34.7%|27.5%|46.7%|53.0%|30.2%| +|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|20.3%|11.3%|29.9%|30.6%|13.7%| +|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|28.1%|25.2%|50.9%|62.6%|27.6%| +|[**faster-whisper v2 w/ VAD**](https://github.com/SYSTRAN/faster-whisper/)|**19.1%**|**11.1%**|**29.5%**|**30.0%**|**12.8%**| +|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|27.5%|22.4%|42.6%|49.4%|25.2%| +|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|22.4%|13.3%|33.8%|36.1%|17.2%| +|[MMS - 102 languages](https://huggingface.co/facebook/mms-1b-fl102)|31.6%|20.3%|54.2%|55.1%|23.9%| +|[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|28.9%|20.0%|50.1%|54.0%|28.3%|
@@ -24,106 +24,115 @@ And for **Dutch conversational speech**: |Model\Dataset|Native Children|Native Teenagers|Non-native Minors|Non-native Adults|Native Elderly| |---|---|---|---|---|---| -|Kaldi_NL|55.4%|62.4%|69.1%|60.0%|44.0%| -|Whisper v2|95.8%|107.4%|124.0%|88.1%|61.9%| -|Whisper v3|75.7%|72.6%|94.3%|84.2%|58.4%| -|Whisper v2 w/ VAD|32.6%|29.4%|42.6%|54.0%|33.1%| -|Whisper v3 w/ VAD|40.3%|31.7%|57.1%|63.2%|41.3%| -|faster-whisper v2|58.9%|65.8%|107.4%|77.7%|39.9%| -|faster-whisper v3|85.8%|68.3%|84.4%|84.5%|51.4%| -|**faster-whisper v2 w/ VAD**|**28.2%**|**22.9%**|**39.2%**|**51.4%**|**26.8%**| -|faster-whisper v3 w/ VAD|34.4%|28.6%|48.7%|58.2%|33.6%| -|XLS-R FT on Dutch|60.2%|62.2%|70.5%|59.1%|47.0%| -|MMS - 102 languages|79.8%|79.9%|90.7%|80.5%|56.4%| -|MMS - 1162 languages|82.4%|87.9%|94.5%|83.3%|59.9%| - - -**Jasmin_{p,q}_{1,2,3,4,5}** = **p** stands for **comp_p (HMI speech)**, whereas **q** stands for **comp_q (read speech)**. The number that can range from **1-5** represents the corresponding **age group/nativeness** from the corpus (for more details, go back one page). +|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|55.4%|62.4%|69.1%|60.0%|44.0%| +|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|95.8%|107.4%|124.0%|88.1%|61.9%| +|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|75.7%|72.6%|94.3%|84.2%|58.4%| +|[Whisper v2 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|32.6%|29.4%|42.6%|54.0%|33.1%| +|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|40.3%|31.7%|57.1%|63.2%|41.3%| +|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|58.9%|65.8%|107.4%|77.7%|39.9%| +|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|85.8%|68.3%|84.4%|84.5%|51.4%| +|[**faster-whisper v2 w/ VAD**](https://github.com/SYSTRAN/faster-whisper/)|**28.2%**|**22.9%**|**39.2%**|**51.4%**|**26.8%**| +|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|34.4%|28.6%|48.7%|58.2%|33.6%| +|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|60.2%|62.2%|70.5%|59.1%|47.0%| +|[MMS - 102 languages](https://huggingface.co/facebook/mms-1b-fl102)|79.8%|79.9%|90.7%|80.5%|56.4%| +|[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|82.4%|87.9%|94.5%|83.3%|59.9%|
-And its corresponding matrix with the **time** spent in total by each model **to evaluate** the respective subset: +And its corresponding matrix with the **time** spent in total by each model **to evaluate Dutch read speech**: -|Model\Dataset|Jasmin_q_1|Jasmin_q_2|Jasmin_q_3|Jasmin_q_4|Jasmin_q_5| +|Model\Dataset|Native Children|Native Teenagers|Non-native Minors|Non-native Adults|Native Elderly| |---|---|---|---|---|---| -|Kaldi_NL|0h:30m:21s|0h:23m:25s|0h:27m:51s|0h:27m:17s|0h:29m:36s| -|Whisper v2|2h:05m:29s|1h:53m:11s|1h:35m:28s|1h:24m:41s|2h:04m:35s| -|Whisper v3|3h:12m:26s|2h:27m:52s|6h:13m:28s*|3h:04m:32s|3h:09m:49s| -|Whisper v2 w/ VAD|2h:14m:40s|1h:51m:46s|1h:49m:48s|4h:18m:51s*|2h:08m:02s| -|Whisper v3 w/ VAD|2h:58m:24s|2h:19m:43s|2h:38m:23s|2h:31m:35s|2h:47m:33s| -|faster-whisper v2|0h:30m:45s|0h:26m:48s|0h:23m:48s|0h:21m:55s|0h:30m:02s| -|faster-whisper v3|0h:41m:58s|0h:38m:13s|0h:48m:28s|0h:55m:48s|0h:44m:12s| -|faster-whisper v2 w/ VAD|0h:32m:55s|0h:27m:16s|0h:25m:51s|0h:21m:58s|0h:32m:09s| -|faster-whisper v3 w/ VAD|0h:40m:33s|0h:31m:45s|0h:37m:36s|0h:37m:11s|0h:38m:00s| -|XLS-R FT on Dutch|0h:35m:18s|0h:27m:33s|0h:32m:39s|0h:31m:49s|0h:39m:05s| -|*MMS - 102 languages*|0h:17m:59s|**0h:13m:22s**|0h:16m:01s|0h:15m:38s|**0h:17m:35s**| -|**MMS - 1162 languages**|**0h:17m:46s**|**0h:13m:22s**|**0h:16m:00s**|**0h:15m:37s**|**0h:17m:35s**| - -|Model\Dataset|Jasmin_p_1|Jasmin_p_2|Jasmin_p_3|Jasmin_p_4|Jasmin_p_5| +|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|0h:30m:21s|0h:23m:25s|0h:27m:51s|0h:27m:17s|0h:29m:36s| +|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|2h:05m:29s|1h:53m:11s|1h:35m:28s|1h:24m:41s|2h:04m:35s| +|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|3h:12m:26s|2h:27m:52s|6h:13m:28s*|3h:04m:32s|3h:09m:49s| +|[Whisper v2 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|2h:14m:40s|1h:51m:46s|1h:49m:48s|4h:18m:51s*|2h:08m:02s| +|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|2h:58m:24s|2h:19m:43s|2h:38m:23s|2h:31m:35s|2h:47m:33s| +|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|0h:30m:45s|0h:26m:48s|0h:23m:48s|0h:21m:55s|0h:30m:02s| +|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|0h:41m:58s|0h:38m:13s|0h:48m:28s|0h:55m:48s|0h:44m:12s| +|[faster-whisper v2 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:32m:55s|0h:27m:16s|0h:25m:51s|0h:21m:58s|0h:32m:09s| +|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:40m:33s|0h:31m:45s|0h:37m:36s|0h:37m:11s|0h:38m:00s| +|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|0h:35m:18s|0h:27m:33s|0h:32m:39s|0h:31m:49s|0h:39m:05s| +|[*MMS - 102 languages*](https://huggingface.co/facebook/mms-1b-fl102)|0h:17m:59s|**0h:13m:22s**|0h:16m:01s|0h:15m:38s|**0h:17m:35s**| +|[**MMS - 1162 languages**](https://huggingface.co/facebook/mms-1b-all)|**0h:17m:46s**|**0h:13m:22s**|**0h:16m:00s**|**0h:15m:37s**|**0h:17m:35s**| + +
+ +And for **Dutch conversational speech**: + +|Model\Dataset|Native Children|Native Teenagers|Non-native Minors|Non-native Adults|Native Elderly| |---|---|---|---|---|---| -|*Kaldi_NL*|0h:16m:09s|0h:10m:29s|0h:11m:28s|0h:19m:09s|**0h:21m:32s**| -|Whisper v2|1h:23m:33s|1h:12m:52s|1h:13m:23s|1h:31m:20s|2h:39m:12s| -|Whisper v3|2h:30m:51s|1h:59m:19s|2h:13m:38s|3h:07m:55s|4h:39m:06s*| -|Whisper v2 w/ VAD|0h:48m:52s|0h:42m:17s|0h:37m:07s|1h:02m:36s|1h:47m:58s| -|Whisper v3 w/ VAD|1h:05m:32s|0h:37m:53s|0h:55m:46s|1h:38m:03s|2h:16m:09s| -|faster-whisper v2|0h:22m:10s|0h:17m:19s|0h:20m:16s|0h:23m:23s|0h:34m:06s| -|faster-whisper v3|0h:54m:15s|0h:32m:13s|0h:34m:35s|0h:55m:02s|1h:12m:22s| -|***faster-whisper v2 w/ VAD***|**0h:09m:59s**|0h:07m:37s|**0h:07m:57s**|**0h:13m:32s**|0h:22m:31s| -|*faster-whisper v3 w/ VAD*|0h:13m:43s|**0h:07m:17s**|0h:09m:57s|0h:22m:45s|0h:25m:52s| -|XLS-R FT on Dutch|0h:42m:20s|0h:24m:19s|0h:26m:52s|0h:36m:42s|0h:48m:26s| -|MMS - 102 languages|0h:18m:02s|0h:14m:02s|0h:14m:01s|0h:18m:59s|0h:25m:34s| -|MMS - 1162 languages|0h:17m:55s|0h:13m:56s|0h:13m:59s|0h:18m:54s|0h:25m:24s| - -* Performance might have been impacted by other processes from other users running on the same GPU since the hardware is available via a cluster system. A rerun using different hardware might be done in the near future. +|[*Kaldi_NL*](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|0h:16m:09s|0h:10m:29s|0h:11m:28s|0h:19m:09s|**0h:21m:32s**| +|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|1h:23m:33s|1h:12m:52s|1h:13m:23s|1h:31m:20s|2h:39m:12s| +|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|2h:30m:51s|1h:59m:19s|2h:13m:38s|3h:07m:55s|4h:39m:06s*| +|[Whisper v2 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|0h:48m:52s|0h:42m:17s|0h:37m:07s|1h:02m:36s|1h:47m:58s| +|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|1h:05m:32s|0h:37m:53s|0h:55m:46s|1h:38m:03s|2h:16m:09s| +|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|0h:22m:10s|0h:17m:19s|0h:20m:16s|0h:23m:23s|0h:34m:06s| +|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|0h:54m:15s|0h:32m:13s|0h:34m:35s|0h:55m:02s|1h:12m:22s| +|[***faster-whisper v2 w/ VAD***](https://github.com/SYSTRAN/faster-whisper/)|**0h:09m:59s**|0h:07m:37s|**0h:07m:57s**|**0h:13m:32s**|0h:22m:31s| +|[*faster-whisper v3 w/ VAD*](https://github.com/SYSTRAN/faster-whisper/)|0h:13m:43s|**0h:07m:17s**|0h:09m:57s|0h:22m:45s|0h:25m:52s| +|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|0h:42m:20s|0h:24m:19s|0h:26m:52s|0h:36m:42s|0h:48m:26s| +|[MMS - 102 languages](https://huggingface.co/facebook/mms-1b-fl102)|0h:18m:02s|0h:14m:02s|0h:14m:01s|0h:18m:59s|0h:25m:34s| +|[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|0h:17m:55s|0h:13m:56s|0h:13m:59s|0h:18m:54s|0h:25m:24s| + +* Performance might have been impacted by other processes from other users running on the same GPU since the hardware is available via a cluster system. Future work includes rerunning these specific experiments. ## Jasmin Flemish results -Matrix with **WER** results for the **Flemish** part of the corpus: +Matrix with **WER** results for **Flemish read speech**: -|Model\Dataset|Jasmin_q_1|Jasmin_q_2|Jasmin_q_3|Jasmin_q_4|Jasmin_q_5| +|Model\Dataset|Native Children|Native Teenagers|Non-native Minors|Non-native Adults|Native Elderly| |---|---|---|---|---|---| -|Kaldi_NL|59.2%|33.5%|51.3%|43.3%|24.7%| -|faster-whisper v2|42.4%|11.7%|19.9%|21.0%|16.7%| -|faster-whisper v3|57.2%|30.6%|44.4%|41.1%|38.7%| -|**faster-whisper v2 w/ VAD**|**41.8%**|**11.6%**|**19.4%**|**20.5%**|**14.4%**| -|faster-whisper v3 w/ VAD|56.2%|26.7%|38.4%|50.7%|33.6%| -|XLS-R FT on Dutch|47.4%|13.3%|30.1%|26.8%|16.4%| -|MMS - 102 languages|55.3%|22.4%|43.0%|37.0%|23.0%| -|MMS - 1162 languages|49.2%|21.8%|34.9%|35.8%|22.3%| - -|Model\Dataset|Jasmin_p_1|Jasmin_p_2|Jasmin_p_3|Jasmin_p_4|Jasmin_p_5| +|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|59.2%|33.5%|51.3%|43.3%|24.7%| +|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|42.4%|11.7%|19.9%|21.0%|16.7%| +|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|57.2%|30.6%|44.4%|41.1%|38.7%| +|[**faster-whisper v2 w/ VAD**](https://github.com/SYSTRAN/faster-whisper/)|**41.8%**|**11.6%**|**19.4%**|**20.5%**|**14.4%**| +|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|56.2%|26.7%|38.4%|50.7%|33.6%| +|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|47.4%|13.3%|30.1%|26.8%|16.4%| +|[MMS - 102 languages](https://huggingface.co/facebook/mms-1b-fl102)|55.3%|22.4%|43.0%|37.0%|23.0%| +|[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|49.2%|21.8%|34.9%|35.8%|22.3%| + +
+ +And for **Flemish conversational speech**: + +|Model\Dataset|Native Children|Native Teenagers|Non-native Minors|Non-native Adults|Native Elderly| |---|---|---|---|---|---| -|Kaldi_NL|66.5%|49.8%|66.2%|64.4%|47.4%| -|faster-whisper v2|87.6%|51.7%|76.1%|67.3%|45.4%| -|faster-whisper v3|90.5%|65.2%|100.4%|79.9%|68.3%| -|**faster-whisper v2 w/ VAD**|**28.7%**|**24.3%**|**38.5%**|**49.3%**|**30.6%**| -|faster-whisper v3 w/ VAD|46.0%|37.7%|57.9%|57.9%|44.6%| -|XLS-R FT on Dutch|73.2%|62.2%|68.1%|52.2%|47.8%| -|MMS - 102 languages|86.7%|52.3%|87.8%|78.2%|56.4%| -|MMS - 1162 languages|86.1%|68.0%|86.3%|76.7%|60.8%| +|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|66.5%|49.8%|66.2%|64.4%|47.4%| +|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|87.6%|51.7%|76.1%|67.3%|45.4%| +|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|90.5%|65.2%|100.4%|79.9%|68.3%| +|[**faster-whisper v2 w/ VAD**](https://github.com/SYSTRAN/faster-whisper/)|**28.7%**|**24.3%**|**38.5%**|**49.3%**|**30.6%**| +|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|46.0%|37.7%|57.9%|57.9%|44.6%| +|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|73.2%|62.2%|68.1%|52.2%|47.8%| +|[MMS - 102 languages](https://huggingface.co/facebook/mms-1b-fl102)|86.7%|52.3%|87.8%|78.2%|56.4%| +|[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|86.1%|68.0%|86.3%|76.7%|60.8%|
-And its corresponding matrix with the **time** spent in total by each model **to evaluate** the respective subset: +And its corresponding matrix with the **time** spent in total by each model **to evaluate Flemish read speech**: -|Model\Dataset|Jasmin_q_1|Jasmin_q_2|Jasmin_q_3|Jasmin_q_4|Jasmin_q_5| +|Model\Dataset|Native Children|Native Teenagers|Non-native Minors|Non-native Adults|Native Elderly| |---|---|---|---|---|---| -|Kaldi_NL|0h:15m:58s|0h:16m:03s|0h:25m:11s|0h:15m:46s|0h:29m:36s| -|faster-whisper v2|0h:09m:30s|0h:20m:12s|0h:18m:03s|0h:12m:09s|0h:14m:31s| -|faster-whisper v3|0h:14m:53s|0h:24m:33s|0h:29m:19s|0h:21m:47s|0h:23m:58s| -|faster-whisper v2 w/ VAD|0h:21m:27s|0h:27m:16s|0h:19m:09s|0h:13m:24s|0h:15m:29s| -|faster-whisper v3 w/ VAD|0h:13m:17s|0h:23m:29s|0h:23m:14s|0h:26m:40s|0h:19m:54s| -|XLS-R FT on Dutch|0h:11m:18s|0h:20m:03s|0h:22m:05s|0h:16m:28s|0h:13m:00s| -|*MMS - 102 languages*|**0h:05m:47s**|0h:09m:09s|**0h:10m:06s**|**0h:07m:37s**|**0h:08m:04s**| -|**MMS - 1162 languages**|**0h:05m:47s**|**0h:09m:07s**|**0h:10m:06s**|**0h:07m:37s**|**0h:08m:04s**| - -|Model\Dataset|Jasmin_p_1|Jasmin_p_2|Jasmin_p_3|Jasmin_p_4|Jasmin_p_5| +|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|0h:15m:58s|0h:16m:03s|0h:25m:11s|0h:15m:46s|0h:29m:36s| +|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|0h:09m:30s|0h:20m:12s|0h:18m:03s|0h:12m:09s|0h:14m:31s| +|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|0h:14m:53s|0h:24m:33s|0h:29m:19s|0h:21m:47s|0h:23m:58s| +|[faster-whisper v2 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:21m:27s|0h:27m:16s|0h:19m:09s|0h:13m:24s|0h:15m:29s| +|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:13m:17s|0h:23m:29s|0h:23m:14s|0h:26m:40s|0h:19m:54s| +|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|0h:11m:18s|0h:20m:03s|0h:22m:05s|0h:16m:28s|0h:13m:00s| +|[*MMS - 102 languages*](https://huggingface.co/facebook/mms-1b-fl102)|**0h:05m:47s**|0h:09m:09s|**0h:10m:06s**|**0h:07m:37s**|**0h:08m:04s**| +|[**MMS - 1162 languages**](https://huggingface.co/facebook/mms-1b-all)|**0h:05m:47s**|**0h:09m:07s**|**0h:10m:06s**|**0h:07m:37s**|**0h:08m:04s**| + +
+ +And for **Flemish conversational speech**: + +|Model\Dataset|Native Children|Native Teenagers|Non-native Minors|Non-native Adults|Native Elderly| |---|---|---|---|---|---| -|Kaldi_NL|0h:07m:09s|0h:07m:36s|0h:08m:37s|0h:10m:51s|0h:14m:45s| -|faster-whisper v2|0h:12m:48s|0h:10m:45s|0h:14m:34s|0h:11m:58s|0h:34m:06s| -|faster-whisper v3|0h:24m:08s|0h:26m:42s|0h:28m:56s|0h:27m:12s|0h:31m:16s| -|**faster-whisper v2 w/ VAD**|**0h:05m:41s**|**0h:07m:03s**|**0h:07m:01s**|**0h:08m:11s**|**0h:09m:45s**| -|faster-whisper v3 w/ VAD|0h:06m:44s|0h:08m:23s|0h:10m:08s|0h:13m:52s|0h:10m:40s| -|XLS-R FT on Dutch|0h:20m:36s|0h:16m:58s|0h:20m:55s|0h:17m:47s|0h:19m:34s| -|MMS - 102 languages|0h:10m:55s|0h:09m:10s|0h:11m:00s|0h:09m:43s|0h:10m:33s| -|MMS - 1162 languages|0h:10m:06s|0h:09m:09s|0h:10m:42s|0h:09m:36s|0h:10m:15s| +|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/)|0h:07m:09s|0h:07m:36s|0h:08m:37s|0h:10m:51s|0h:14m:45s| +|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|0h:12m:48s|0h:10m:45s|0h:14m:34s|0h:11m:58s|0h:34m:06s| +|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|0h:24m:08s|0h:26m:42s|0h:28m:56s|0h:27m:12s|0h:31m:16s| +|[**faster-whisper v2 w/ VAD**](https://github.com/SYSTRAN/faster-whisper/)|**0h:05m:41s**|**0h:07m:03s**|**0h:07m:01s**|**0h:08m:11s**|**0h:09m:45s**| +|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:06m:44s|0h:08m:23s|0h:10m:08s|0h:13m:52s|0h:10m:40s| +|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|0h:20m:36s|0h:16m:58s|0h:20m:55s|0h:17m:47s|0h:19m:34s| +|[MMS - 102 languages](https://huggingface.co/facebook/mms-1b-fl102)|0h:10m:55s|0h:09m:10s|0h:11m:00s|0h:09m:43s|0h:10m:33s| +|[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|0h:10m:06s|0h:09m:09s|0h:10m:42s|0h:09m:36s|0h:10m:15s| diff --git a/UT/N-Best/nbest_res.md b/UT/N-Best/nbest_res.md index 8eba687..5746a61 100644 --- a/UT/N-Best/nbest_res.md +++ b/UT/N-Best/nbest_res.md @@ -14,31 +14,32 @@ For more details about the corpus, click [here](https://citeseerx.ist.psu.edu/do Here is a matrix with **WER** results of the baseline model, Kaldi_NL, as well as different end-to-end models tested on this corpus: -|Model\Dataset|bn_nl|cts_nl|bn_vl|cts_vl| +|Model\Dataset|Broadcast News in NL|Conversational Speech in NL|Broadcast News in BE|Conversational Speech in BE| |---|---|---|---|---| -|Kaldi_NL|12.6%|38.6%|21.2%|59.4%| -|Whisper v2|12.7%|25.9%|-|-| -|Whisper v3|13.7%|28.1%|-|-| -|Whisper v2 w/ VAD|11.6%|25.3%|-|-| -|Whisper v3 w/ VAD|14.1%|26.5%|-|-| -|faster-whisper v2|10.6%|24.1%|**13.0%**|38.5%| -|faster-whisper v3|12.5%|25.5%|14.9%|38.4%| -|**faster-whisper v2 w/ VAD**|**10.0%**|**23.9%**|13.6%|37.9%| -|faster-whisper v3 w/ VAD|12.3%|25.1%|14.6%|**36.9%**| -|XLS-R FT on Dutch|14.8%|33.5%|17.0%|51.7%| -|MMS - 102 languages|23.4%|49.0%|25.0%|64.2%| -|MMS - 1162 languages|18.5%|42.7%|19.4%|57.7%| +|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|12.6%|38.6%|21.2%|59.4%| +|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|12.7%|25.9%|-|-| +|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|13.7%|28.1%|-|-| +|[Whisper v2 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|11.6%|25.3%|-|-| +|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|14.1%|26.5%|-|-| +|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|10.6%|24.1%|**13.0%**|38.5%| +|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|12.5%|25.5%|14.9%|38.4%| +|[**faster-whisper v2 w/ VAD**](https://github.com/SYSTRAN/faster-whisper/)|**10.0%**|**23.9%**|13.6%|37.9%| +|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|12.3%|25.1%|14.6%|**36.9%**| +|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|14.8%|33.5%|17.0%|51.7%| +|[MMS - 102 languages](https://huggingface.co/facebook/mms-1b-fl102)|23.4%|49.0%|25.0%|64.2%| +|[MMS - 1162 languages](https://huggingface.co/facebook/mms-1b-all)|18.5%|42.7%|19.4%|57.7%|
-And here are results for the same models on `bn_nl` with the foreign speech lines removed from the dataset: -|Model\Dataset|bn_nl| +And here are results for the same models on `Broadcast News in the Netherlands` with the foreign speech lines removed from the dataset: + +|Model\Dataset|WER| |---|---| -|Kaldi_NL|12.1%| -|Whisper v2|12.3%| -|Whisper v3|13.6%| -|**Whisper v2 w/ VAD**|**11.1%**| -|Whisper v3 w/ VAD|13.9%| +|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|12.1%| +|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|12.3%| +|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|13.6%| +|[**Whisper v2 w/ VAD**](https://github.com/linto-ai/whisper-timestamped)|**11.1%**| +|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|13.9%| Something to note about this setup of the dataset is that Whisper v3 was already able to recognize foreign speech and ignore it. The only notable exception was for a German speaker, where I assume that it was still transcribed because German and Dutch are very similar as languages. So, Whisper v3's hypothesis files were mostly unedited. @@ -46,20 +47,20 @@ Something to note about this setup of the dataset is that Whisper v3 was already Here is also a matrix with the **time** spent in total by each model **to evaluate** the respective subset: -|Model\Dataset|bn_nl|cts_nl|bn_vl|cts_vl| +|Model\Dataset|Broadcast News in NL|Conversational Speech in NL|Broadcast News in BE|Conversational Speech in BE| |---|---|---|---|---| -|Kaldi_NL|0h:08m:58s|0h:14m:47s|0h:15m:57s|0h:20m:07s| -|Whisper v2|1h:11m:59s|0h:53m:55s|-|-| -|Whisper v3|1h:09m:00s|0h:40m:20s|-|-| -|Whisper v2 w/ VAD|0h:52m:03s|0h:40m:09s|-|-| -|Whisper v3 w/ VAD|1h:02m:13s|0h:37m:50s|-|-| -|faster-whisper v2|0h:11m:31s|0h:09m:30s|0h:10m:55s|0h:09m:39s| -|faster-whisper v3|0h:11m:21s|0h:09m:41s|0h:10m:36s|0h:09m:54s| -|faster-whisper v2 w/ VAD|0h:12m:13s|0h:09m:36s|0h:11m:01s|0h:09m:46s| -|faster-whisper v3 w/ VAD|0h:12m:25s|0h:09m:13s|0h:10m:45s|0h:09m:04s| -|XLS-R FT on Dutch|0h:07m:36s|0h:07m:52s|0h:08m:44s|0h:08m:21s| -|*MMS - 102 languages*|**0h:04m:33s**|0h:04m:23s|0h:04m:38s|**0h:03m:54s**| -|*MMS - 1162 languages*|0h:05m:26s|**0h:04m:14s**|**0h:04m:37s**|0h:03m:55s| +|[Kaldi_NL](https://github.com/opensource-spraakherkenning-nl/Kaldi_NL)|0h:08m:58s|0h:14m:47s|0h:15m:57s|0h:20m:07s| +|[Whisper v2](https://github.com/linto-ai/whisper-timestamped)|1h:11m:59s|0h:53m:55s|-|-| +|[Whisper v3](https://github.com/linto-ai/whisper-timestamped)|1h:09m:00s|0h:40m:20s|-|-| +|[Whisper v2 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|0h:52m:03s|0h:40m:09s|-|-| +|[Whisper v3 w/ VAD](https://github.com/linto-ai/whisper-timestamped)|1h:02m:13s|0h:37m:50s|-|-| +|[faster-whisper v2](https://github.com/SYSTRAN/faster-whisper/)|0h:11m:31s|0h:09m:30s|0h:10m:55s|0h:09m:39s| +|[faster-whisper v3](https://github.com/SYSTRAN/faster-whisper/)|0h:11m:21s|0h:09m:41s|0h:10m:36s|0h:09m:54s| +|[faster-whisper v2 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:12m:13s|0h:09m:36s|0h:11m:01s|0h:09m:46s| +|[faster-whisper v3 w/ VAD](https://github.com/SYSTRAN/faster-whisper/)|0h:12m:25s|0h:09m:13s|0h:10m:45s|0h:09m:04s| +|[XLS-R FT on Dutch](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-dutch)|0h:07m:36s|0h:07m:52s|0h:08m:44s|0h:08m:21s| +|[*MMS - 102 languages*](https://huggingface.co/facebook/mms-1b-fl102)|**0h:04m:33s**|0h:04m:23s|0h:04m:38s|**0h:03m:54s**| +|[*MMS - 1162 languages*](https://huggingface.co/facebook/mms-1b-all)|0h:05m:26s|**0h:04m:14s**|**0h:04m:37s**|0h:03m:55s| ### Preprocessing, setup, and postprocessing For more details, click [here](./nbest_setup.md). diff --git a/_includes/head-custom.html b/_includes/head-custom.html index 9f871a5..3f2439d 100644 --- a/_includes/head-custom.html +++ b/_includes/head-custom.html @@ -1 +1 @@ - \ No newline at end of file + \ No newline at end of file diff --git a/assets/images/SoS.png b/assets/images/SoS.png new file mode 100644 index 0000000000000000000000000000000000000000..20e8aa7398ca48cd09153c98ffaaad23468c177b GIT binary patch literal 5177 zcmcIohcg^p*GCpBc%ny(ELLxcPL!;&(YuHi1gm$VN36CI-RkXOH);?pqD%B1B}83q z)gW3#e>d+p^Zf_!ojLc;xifd}nS0MUzu!5rdOFW2?=s&dARwSrS5q;-ul;`oNP?d$ zUsUB25YQ2*t3Zqb=JxUe16XmPeOO!A?{93iAJbmjt25MooNsv|7Q9_+?;2rG!N{61 zDc(bx{)ta~g;1Fm$j${0|E9DYm_ouEs^!&{7X5ZedFy!Uk5hgL$*aM`Wx76{$6yi` zS0U?PHKDK>my!Hz%WA*#%fNiBV)psS;b@5XlLLuJyea>i@h|Y7QkT50(A_W^%af7D zckga?!=eoUhACZNzVtc0HlHw!HKdmkT)Y?Z>O= z4OoUh6tAxP4*XSNsG%7K%6q*aAORJC07%qnN3*`woBfntgZXLF3GiNT^!NIiLgbe$ z-u?zhJ4cr)#((Z9w2Y&o-Ib{$#B`E=S^tEZ=qYUz0*N35045K}lt%9r7TkSSd19^A zow6G#zzs^jZK|#!$ECgRu|-eA&~20hQWPsSkuuIE0F^rvoG@1*oKcEDA;1 zK#?!ifw_SXMN_&E z-f?xaQnV8~UNMlQqQ_Z8LUFt?jQX}vrRAMP&e=vW=(-K6DR;1g9o>~4Kr=PPx`C$7D`Td^w=p^2hqXcE=PY^9f}-2h0YI%5u2W>8^&7Cky9JP4XUpjnm{tUSbOc z9I29I8h`()w&OO$FzoiJ9MKMOn7t8|iRyaB;^O_->uXK0P*w9zkh!WVR+tl>`9Uu( zi5``&+YCY)Syfu?dkz=$l*v*+K>bcM>t7vCh6Ds0s>S)$a2QXGD()BfQ^B@;h{-@n zKsbmB0DqdK&ZE{Oc4S3AX*Q}G%^}kE!{*>7l^OzOXI%3wOq|GWI01FT;2n;faFa(&af4O!#zhrbWBgR{bz8{$~-50+v6)g0e)VZe9hyjhG zBZf$b;O_xw!63o`vZ_hj+O}XdM)=!U5w&`km6xXNKdJGeFOiMlM{M~rB*8Ry0Sa*>!t!~T*Z}CSx6-Hf!>7xA?ikP3(z$AIWylWZZJ$z-1;3~ zVjOVd-1;2|!;jmjE@kw5#C^Mu1`T&q)Ygtg;D0ig1_sIRQ=4ByVidE40EKHmT?4r% zb2aO6aTs}snjb`}NR=Yedr$XaZm`+2+H#ow#KrqR3a0a+!Wr)!a^K5LbiSCkMPl62 z!5%#OkAOr3_%9=XKcyo0|Da_8A&ldAf7k98WMqtmaydwEv~$Gz$zRB+Q&=ITN<#Jp z?9=yywZ0A?orGpSq-=k0#3-%B@b-XBTd^*w0=tFKh|y(A13`9M^}rckVFMtDreT*@~}tZ_IzYNv!O-~4Yr#q?GDH*+c>Ao z(Ogspb>IWeos9DmO=A=?2$pD9ZwSSiJWD72J?py2WOYCjwT#_u5EFby$mAbHyXx^w1Cv?$I+amTj`#@?si3wy*gly4Rv zP+A_tY6_mS6pX%WcX&^W{$r|bfa=Zr;Xkh(g&XK9gY&lB9V$5H1@KYrO~S^PDBFvQ z9NH`}tR?zX;R5DlPR;E8MwoS)ZkvOkj(U4*Ls2s`lKf9*90e@YD%RmI3@4k$hV$#U zlI_GLbB|rzMqgg9&%}MMyK4Z8XLWTmy%M>&E*Tw26IR<9YuVXZ+Juef-dc;YHss5f z!F``5DV>Mx?>y1jEXnCprmFa`OSm>$2G^=a$W>T1d#)H5;!fHZp$^Zl_Wi?_ipCzY zwu%+S+NvZu;o6=&|M-Gy>)1u^T7;QaC02&DE`2LwW?DG#A9yZNS90NepY}gON)e>) z74bkSa3e?z)GytxINaiX8E{vW)x!T7-{>89?mrOOT}Ji2!Lis7J^)h%@Z;j+pO&T) zJfHRyp;lp!Usax$T zGr^e3O_X==LCnCX`5*V$j+3nU@QGWdUt0ie7Q{VxZJVNENS26jDmD#Sl~2JEk!nln#&duuN$2KV)Hm4P8*Ov!%jvp2 z{nNMYEx^t27P3R54Vp;9)1<}e)LN@>;@>RvDm3uhtQ3|4+8>#q=RWIuf4t^{8hnb| z3yxBg`gQeo_9H>IEVs0j)m3acVwGD=iCOe{#rvj2ntMc?dVVEfi5GQbO0fqxibM7J zHK^4DS((nZ9`Dieot{?o%=Bwhz)q-Ss3c=@06kDN{d?<5;gjN85Ho}3C*gW7d4H4C zSPr$;@LFiS=qff8+87bS&YP* zUyf1jE#Ou%q5TUU;KusKNk_ZIHaw*{JfycH&WK#*R;RcYHV0~53*l4Yno=AUGi47K zQ^)AL?l9XBz2ucc(4Y?fZ10KsQO|<0zjneQNCR0j8%yBS zs(Qr=)h;WyZ>cf_B<5LQD(ddH2ykEqkJM0>f9ihhMMq@m^l zko!%NnuGbPj@eC0ZuHk|IZ2ImnKE4+-N}A09tG{Zf!ygovRW_GFV5bC(+1*V_o9zk z&t@=wu1U7ulnA_I<26gAm2>oc1^#%(J0G8NU-I#bnc&TPn371Le{uk~0KFvWbckw z#q$gWN`Z&?`y~2Xv~Avz70JpN`#ZP6;sIPcrx&got^V=Z$~Y^u6|hckb6qPm*YMQy zw;Ya1Dd^|Gsm!5ecbPZB5J&4qXV)M`U3!s6dmf`*6qNUdiE~gB#k%=8oL`rQV{F*i zATB`4c}=GlN3OLv_G3YHTP6kFa91nT#QYiEX~YP$&;6eE=J4&KlAUB$drY2zu1{4+ zEV28Fk-WlS;4?115F62wjt;A3qow@M)9ntx!+h3YyVb&DX~URfH(fkyb9NqW)M0`! z9^PP*lKqgS9^2R7n5p99tB%nNatx0{cDE*@USEN{;w0R!=}<%_w_`Z7)ls?or+2Q6 zO5(nYgVTL2IuYIr%QEfi2nVydFr!|?*#XU+e~;D;aK=g(BYb@-X*0fkzTGQZsK`=Q1c&CN{uaS6X!)6vb_hhd__XV*xZuA&RQkYKRIP}xOU^4{!T zo4@u0$lcr^%#Wr(q4X>lTA95mj>wz2AJTss3VI9JFvT*bMZ~u5XHSEBo6r_dxuCHv zLN#c=K=!U+KK=ZO+mJ%=uRVR7+Uuyo!QHq~nQ_nA>gzk|Sqt+{yXFdW64bM~K-Az% zRKW&A>DAbkiY|e^;ARoypBgWj6nislinPf&c7l82N2)achk>nzzdSi~5csMFpcGe1 zN}egbvZO>zoq3Va?IJ*{$kL1{U?p*v#6F4iol1-W#kQM1Xh`On?04Ody`4jiK(RN2 zWk99av-YpOLjpB}+sABLa)F};Vxtnbhio{lCdKpofbzeo|Jy+aCQPFZozqhVjg6eI zr|R(p*S~GcNg2XWsa481f_(BPt8*xFv=wH@24K4`_g~C~iLldYjIpc8%iOe1!9AJ( zfI`FKF4&|v5M9&tVTY@y6(pmJpSYPQ#WGrrUqMa^bP!Dy46eJo=JkS1ZI&ih$O~QV zSYxB%_lj{gxUsm}hzLVY8vp^5o|>9olAN zwNDh=xPA=_?t7q2u%AIoKURBY1f}oH<6{?}75~fWYwn}nv;fJga6vU%w3W5^ zslN)PQlUFvC)L(Q zQ+}Z9R{PEa7!XE``JabeOMEly)HVn;-=2ev29v?*coFL5mV}T|3doK#X~WYbR=8)m z9RPK+7-^&{vZwn|@G=}^E1;I{0`fi;Gj3aD1*WJ{jmj1X2)MXv-m5zobWpdNh%sR{D!^aC7dSMx-Ojfr85N7^ zY$&o-?>hDEd^CMubGDH84(=kfzEQ*jXLP=5L~M$2+!MWi?6h)L;1&Ur0Z>Q%`}|)f_D47c zCH_^^wyQxgNYcdwn!sCr`)x|xI@=NNTnL$;Wqtf)>|JevBgJadqD7eFS_`7YcL z62~U33QK|hbY6-%Y+J*_kbj43(RrZNogU|NX5rPdSC4;W=a>4WOy5LV6RPg&q%D$o zCy;+!ONa#lOK#kvj2`>QZf;w^*V!UCI^?|tA7FVuXO~_5c zL-{%|hg}U%Ob%HR{$xBV3Xynurx;)Cf(J*UJ*4>$KD}s$(eVR0Z`wF@KbAwG$T5oj zV` zCm8P}#oBB*4-_86$P_~RvJc&-k1m%PZUM(lBRbyw$ETgemX^avCEtdFt7ey%Se zCOdOfR>^F~du+|zSTY4`rx7gLOcUl&HEbH_E86%qIW&!4&WoZcL1tbMbhy|2Oax_d z%pBvBUY*u2ab6XtQa%L5R=-woo*U*KF^;vrIa;ts3WOLNUU5BmOGW}9{C5ft2G8QV zGVUX^=91GSU5$l^FHC%h@vY|HK}gZJpr*C0-sb;n(Xjrv>%&Efk9k78m%kN5m{){~ z6|H>X#G^MWMTLtyBAQeYUoWmNCp9-nFtaU?LkRl6tSnpG;+sHJaDt?W&G`(Wcxbc+ z*TA7;z^{Rq&*JI1$crhgBBu*J_|eC9w7N)G^z_s?1}thIxif1+mi+f7wiYO*=;_;b jtD$5>1pljngx`{;6E}WeayNbZ_YQSc9hE92YxsWveU7NP literal 0 HcmV?d00001 diff --git a/index.md b/index.md index 46cb7c8..f2df8aa 100644 --- a/index.md +++ b/index.md @@ -12,17 +12,16 @@ Welcome to the benchmark page where researchers and developers report performanc - [Environment setup](./UT/environment.md) - [Why do the results differ between whisper-timestamped and faster-whisper?](./UT/analysis.md) -The results in **bold** indicate the best performance for the specific subset(s) between all models. +The results in **bold** indicate the best performance for the specific subset(s) between all models. The lower, the better. -These results were achieved during the PDI-SSH **O**ral **H**istory - **S**tories at the **M**useum around **Art** ([OH-SMArt](https://www.uva.nl/en/discipline/conservation-and-restoration/research/research-projects/oh-smart/oh-smart.html)) project. +These results were achieved during the PDI-SSH **O**ral **H**istory - **S**tories at the **M**useum around **Art** ([OH-SMArt](https://www.uva.nl/en/discipline/conservation-and-restoration/research/research-projects/oh-smart/oh-smart.html)) project (2022-2025).

RU's Kaldi_NL vs. Whisper vs. Wav2vec2.0 evaluation

*RU = Radboud University* - [Results on four medical domain datasets](./RU/wer.md) -- [Hardware setup](./RU/hardware.md) - +- [Environment setup](./RU/environment.md) These results were achieved during the PDI-SSH **Ho**mo **Med**icinalis ([HoMed](https://homed.ruhosting.nl/)) project (2021-2024).