Skip to content

Releases: techiaith/docker-huggingface-stt-cy

22.10 (Hydref / October 2022)

26 Oct 19:26
Compare
Choose a tag to compare

Read this release note in English

Dyma ein modelau a sgriptiau ym mis Hydref 2022 (22.10) ar gyfer adnabod lleferydd Cymraeg effeithiol ar sail y ddull wav2vec2. Yn newydd yn y cyhoeddiad yma o'r gwaith yw:

  • sgriptiau cychwynnol i rhag-hyfforddi modelau gyda rhagor o sain leferydd Cymraeg ac yna i'w fireinio ar gyfer wireddu adnabod lleferydd Cymraeg gwell
  • o ganlyniad, model arbrofol newydd ('wav2vec2-base-cy') sydd wedi ei rhag-hyfforddi gyda dros 180 awr o leferydd Cymraeg o amrywiaeth o fideos YouTube.
  • modd i hyfforddi gydag is-setiau ein hunain a fwy defnyddiol o fersiwn 11 o Common Voice Cymraeg a Saesneg (gweler https://github.com/techiaith/docker-commonvoice-custom-splits-builder) a chyhoeddwyd ym mis Medi 2022.
  • o ganlyniad, model acwstig adnabod lleferydd dwyieithog Cymraeg a Saesneg newydd ('wav2vec2-xlsr-ft-en-cy') gyda WER o 17.07% ar set profi ddilys o Common Voice
  • a model adnabod lleferydd Cymraeg ('wav2vec2-xlsr-ft-cy') lawer mwy effeithiol a chywir gyda gostyngiad o 67% yn y WER o 12.38% i 4.05% ar gyfer adnabod lleferydd Cymraeg yn unig ar set profi ddilys o Common Voice.
  • seilwaith gweinydd API trawsgrifio newydd gyda'r modd i gysylltu ag API sy'n atgyweirio atalnodi a chyfalafu mewn testunau Cymraeg (gweler https://github.com/techiaith/docker-atalnodi-server)

D.S. er bod y WER wedi gwella i 4.05% ar set brofi o Common Voice bellach, ond promptiau wedi eu darllen yn bwyllog sydd yn y set brofi honno. Gyda sgyrsiau naturiol, digymell, mae’r WER yn agosach at 30% ac angen rhagor o waith hyfforddi a gwerthuso.

Ceir ffeiliau modelau ar wefan HuggingFace:


in English

These are our models and scripts in October 2022 (22.10) for effective Welsh speech recognition based on wav2vec2. New in this release of the work are:

  • initial scripts to pre-train models with more Welsh speech audio and then to fine-tune to experiment with improving Welsh speech recognition results.
  • as a result, a new experimental model ('wav2vec2-base-cy') which has been pre-trained with over 180 hours of Welsh speech collected from a variety of videos on YouTube.
  • a means to train with our own custom splits of version 11 of Common Voice Welsh and English (see https://github.com/techiaith/docker-commonvoice-custom-splits-builder) published in September 2022 .
  • as a result, a new Welsh and English bilingual speech recognition acoustic model ('wav2vec2-xlsr-ft-en-cy') with a WER of 17.07% when evaluated on a test set from Common Voice.
  • and a much more accurate speech recognition model ('wav2vec2-xlsr-ft-cy'), with a 67% reduction in the WER from 12.38% to 4.05%, for Welsh only when evaluated with a test set from Common Voice.
  • new transcription API server infrastructure with supports connecting to an API that can restore punctuation and capitalization in Welsh texts (see https://github.com/techiaith/docker-atalnodi-server)

N.B. although the WER has now improved to 4.05% on a test set from Common Voice, this test set contains prompts that have been read carefully and calmly. With natural, spontaneous or conversational speech, the WER is believed to be closer to 30% and thus needs more training and evaluation.

Model files can be found on the HuggingFace website:

22.06 (Mehefin / June 2022)

16 Jun 09:47
Compare
Choose a tag to compare

Read this release note in English

Dyma ein sgriptiau ym mis Mehefin 2022 (22.06) ar gyfer hyfforddi, gwerthuso, defnyddio a chynnal API adnabod lleferydd Cymraeg eich hunain ar sail modelau wav2vec2 gan Facebook ac HuggingFace, a KenLM gan Kenneth Heafield ac eraill.

Rydym hefyd yn cyhoeddi modelau sydd wedi'u hyfforddi gyda data Mozilla CommonVoice Cymraeg fersiwn 9, a chyhoeddwyd ym mis Ebrill 2022, a data corpws testunau Cymraeg OSCAR o fis Ebrill 2022.

Ceir ffeiliau modelau ar wefan HuggingFace:

Mewn arbrofion syml gyda set profi Common Voice, pan ddefnyddir y model acwsteg wav2vec2-xslr-ft-cy (~1Gb mewn maint) a model iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 13.74% o eiriau mewn brawddeg.

Mewn arbrofion syml gyda set profi Common Voice, pan ddefnyddir y model acwsteg wav2vec2-xsl-r-1b-ft-cy (~3Gb mewn maint) a model iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 12.38% o eiriau mewn brawddeg.


in English

Here are our June 2022 (22.06) scripts for training, evaluating, using and hosting your own Welsh speech recognition models based on wav2vec2 by Facebook AI and HuggingFace, as well as KenLM by Kenneth Heafield and others.

This release also contains models trained with the Welsh dataset from Mozilla CommonVoice version 9 as published in April 2022 and the Welsh text corpus dataset from OSCAR from April 2022.

Models can be found on the HuggingFace website:

In simple evaluations on the Welsh Common Voice test set, the wav2vec2-xlsr-ft-cy acoustic models (size ~1Gb), when used together with a language model, exhibits a word error rate of 13.74%.

In simple evaluations on the Welsh Common Voice test set, the wav2vec2-xls-r-1b-ft-cy acoustic models (size ~3Gb), when used together with a language model, exhibits a word error rate of 12.38%.

22.01 (Ionawr / January 2022)

31 Jan 11:56
Compare
Choose a tag to compare

Read this release note in English

Dyma ein sgriptiau ym mis Ionawr 2022 (22.01) ar gyfer hyfforddi, gwerthuso, defnyddio a chynnal API adnabod lleferydd Cymraeg eich hunain ar sail model wav2vec2-large-xlsr-53 gan Facebook ac HuggingFace, a KenLM gan Kenneth Heafield ac eraill.

Rydym hefyd yn cyhoeddi modelau sydd wedi'u hyfforddi gyda data Mozilla CommonVoice Cymraeg fersiwn 8, a chyhoeddwyd ym mis Ionawr 2022, a data corpws testunau Cymraeg OSCAR o fis Ionawr 2022.

Ceir ffeiliau modelau ar wefan HuggingFace: https://huggingface.co/techiaith/wav2vec2-xlsr-ft-cy/tree/22.01

Mewn arbrofion syml, pan ddefnyddir y model acwsteg ac iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 13.79% o eiriau mewn brawddeg.


in English

Here are our January 2021 (22.01) scripts for training, evaluating, using and hosting your own Welsh speech recognition models based on wav2vec2-large-xlsr-53 by Facebook AI and HuggingFace, as well as KenLM by Kenneth Heafield and others.

This release also contains models trained with the Welsh dataset from Mozilla CommonVoice version 8 as published in January 2022 and the Welsh text corpus dataset from OSCAR from January 2022.

Models can be found on the HuggingFace website: https://huggingface.co/techiaith/wav2vec2-xlsr-ft-cy/tree/22.01

In simple evaluations on the Welsh Common Voice test set, the models, when used together in inference, exhibit a word error rate of 13.79%.

21.08 (Awst / August 2021)

26 Aug 15:04
Compare
Choose a tag to compare

Read this release note in English

Dyma ein sgriptiau ym mis Awst 2021 (21.08) ar gyfer hyfforddi, gwerthuso, defnyddio a chynnal API adnabod lleferydd Cymraeg eich hunain ar sail wav2vec2 gan Facebook AI ac HuggingFace, a KenLM gan Kenneth Heafield ac eraill.

Rydym hefyd yn cyhoeddi modelau sydd wedi'u hyfforddi gyda data Mozilla CommonVoice Cymraeg fersiwn 7, a chyhoeddwyd ym mis Gorffennaf 2021, a data corpws testunau Cymraeg OSCAR o fis Awst 2021.

Ceir ffeiliau modelau ar wefan HuggingFace: https://huggingface.co/techiaith/wav2vec2-xlsr-ft-cy/tree/21.08

Mewn arbrofion syml, pan ddefnyddir y model acwsteg ac iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 14% o eiriau mewn brawddeg.


in English

Here are our August 2021 (21.08) scripts for training, evaluating, using and hosting your own Welsh speech recognition models based on wav2vec2 by Facebook AI and HuggingFace, and KenLM by Kenneth Heafield and others.

This release also contains models trained with the Welsh dataset from Mozilla CommonVoice version 7 as published in July 2021 and the Welsh text corpus dataset from OSCAR from August 2021.

Models can be found on the HuggingFace website: https://huggingface.co/techiaith/wav2vec2-xlsr-ft-cy/tree/21.08

In simple evaluations on the Welsh Common Voice test set, the models, when used together in inference, exhibit a word error rate of 14%.

21.05 (Mai / May 2021)

09 Jun 05:59
Compare
Choose a tag to compare

Read this release note in English

Dyma ein sgriptiau ym mis Mai 2021 (21.05) ar gyfer hyfforddi, gwerthuso a chynnal API adnabod lleferydd Cymraeg eich hunain ar sail wav2vec2 gan Facebook AI ac HuggingFace, a KenLM gan Kenneth Heafield ac eraill.

Rydym hefyd yn cyhoeddi modelau sydd wedi'u hyfforddi gyda data Mozilla CommonVoice Cymraeg, a chyhoeddwyd ym mis Rhagfyr 2020, a data corpws testunau Cymraeg OSCAR o fis Mai 2021.

Mewn arbrofion syml, pan ddefnyddir y model acwsteg ac iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 15% o eiriau mewn brawddeg.


in English

Here are our May 2021 (21.05) scripts for training, evaluating and hosting your own Welsh speech recognition models based on wav2vec2 by Facebook AI and HuggingFace, and KenLM by Kenneth Heafield and others.

This release also contains models trained with the Welsh dataset from Mozilla CommonVoice as published in December 2020 and the Welsh text corpus dataset from OSCAR from May 2021.

In simple evaluations on the Welsh Common Voice test set, the models, when used together in inference, exhibit a word error rate of 15%.