Skip to content

Synthesized audio of 300 core words of 42 Indo-European languages

License

Notifications You must be signed in to change notification settings

EL-CL/acoustic-dist-ie-audio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio of 300 core words of 42 Indo-European languages

DOI

The speech sounds of 300 core words in this repository are synthesized using the text-to-speech engine in Microsoft Azure AI Speech Studio, which encompasses 42 Indo-European languages. Each word is synthesized in both male and female voices, resulting in a total of 25,200 audio clips (300 words × 42 languages × 2 genders). All audio clips are in 16-bit, 16 kHz, mono WAV format, with leading and trailing silences trimmed.

See word_list.csv for the vocabulary of 300 core words in each language.

Further utilization of these audios can be found in the repository: https://github.com/EL-CL/acoustic-dist-ie.

42 languages:

  • Afrikaans
  • Albanian
  • Armenian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • French
  • Galician
  • German
  • Greek
  • Gujarati
  • Hindi
  • Icelandic
  • Irish
  • Italian
  • Latvian
  • Lithuanian
  • Macedonian
  • Marathi
  • Nepali
  • Norwegian
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Serbian
  • Sinhala
  • Slovak
  • Slovenian
  • Spanish
  • Swedish
  • Ukrainian
  • Urdu
  • Welsh

About

Synthesized audio of 300 core words of 42 Indo-European languages

Resources

License

Stars

Watchers

Forks