Skip to content

afrisauti/common-voice-bundler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CommonVoice Bundler

Script for bundling Common Voice (https://voice.mozilla.org) clips by language.

What it does

  1. Query database for all clip data
  2. Download all those clips from an S3
  3. Anonymize clips client_id and filename (called path)
  4. Upload a tsv file with all the anonymized clip data
  5. Put clips into archives by language and upload it to (a different) S3

How to run it

  1. Install node (>= 8.3.0)
  2. Install yarn
  3. Install CorporaCreator
  4. Install mp3-duration-sum
  5. git clone git@github.com:Common-Voice/common-voice-bundler.git
  6. Override the keys defined in config.js with a config.json in the same dir
  7. yarn
  8. yarn start

About

Script for bundling Common Voice (https://voice.mozilla.org) clips by language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%