You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. When creating a dataset, what is the be best way to transcribe abbreviated words? slash Here is list of few examples: slash CAN which sounds 'see a n' rather than 'can' - should it be 'C A N' slash BEE which sounds 'be e e' rather than 'bee' - should it be 'B E E'
2. what is the be best way to transcribe a year? slash Example 1985. Should it be nineteen eighty five or simply 1985?
Your advice would be highly appreciated.
[This is an archived TTS discussion thread from discourse.mozilla.org/t/abbreviations-in-dataset-transcript]
With 2 it's safest to transcribe it to words as it was actually said, since then you'll be sure that no matter how the text is processed it will match. This is useful in cases where it is ambiguous, like 2012 where sometimes it's said 'two thousand and twelve' and others 'twenty twelve'
With 1 it depends on how phonemizer would say it - you want the way you've transcribed it to be turned into the same sounds/words by phonemizer as your recording actually used. slash I'm not by my computer to test it, but I think generally with English abbreviations when using espeak-ng through phonemizer it would generally turn capitalised words into the equivalent of the letters. Thus 'BBC' becomes the equivalent of 'be be see' and provided that was how it was said then it would be fine to transcribe as BBC in that case.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
>>> jerry.matjila
[October 16, 2020, 7:52am]
1. When creating a dataset, what is the be best way to transcribe
abbreviated words? slash
Here is list of few examples: slash
CAN which sounds 'see a n' rather than 'can' - should it be 'C A N' slash
BEE which sounds 'be e e' rather than 'bee' - should it be 'B E E'
2. what is the be best way to transcribe a year? slash
Example 1985. Should it be nineteen eighty five or simply 1985?
Your advice would be highly appreciated.
[This is an archived TTS discussion thread from discourse.mozilla.org/t/abbreviations-in-dataset-transcript]
Beta Was this translation helpful? Give feedback.
All reactions