Audio of 300 core words of 42 Indo-European languages

The speech sounds of 300 core words in this repository are synthesized using the text-to-speech engine in Microsoft Azure AI Speech Studio, which encompasses 42 Indo-European languages. Each word is synthesized in both male and female voices, resulting in a total of 25,200 audio clips (300 words × 42 languages × 2 genders). All audio clips are in 16-bit, 16 kHz, mono WAV format, with leading and trailing silences trimmed.

See word_list.csv for the vocabulary of 300 core words in each language.

Further utilization of these audios can be found in the repository: https://github.com/EL-CL/acoustic-dist-ie.

42 languages:

Afrikaans
Albanian
Armenian
Bengali
Bosnian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
English
French
Galician
German
Greek
Gujarati
Hindi
Icelandic
Irish
Italian
Latvian
Lithuanian
Macedonian
Marathi
Nepali
Norwegian
Pashto
Persian
Polish
Portuguese
Romanian
Russian
Serbian
Sinhala
Slovak
Slovenian
Spanish
Swedish
Ukrainian
Urdu
Welsh

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
female		female
male		male
LICENSE		LICENSE
README.md		README.md
word_list.csv		word_list.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio of 300 core words of 42 Indo-European languages

About

Uh oh!

Releases 1

License

EL-CL/acoustic-dist-ie-audio

Folders and files

Latest commit

History

Repository files navigation

Audio of 300 core words of 42 Indo-European languages

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1