Script for bundling Common Voice (https://voice.mozilla.org) clips by language.
- Query database for all clip data
- Download all those clips from an S3
- Anonymize clips
client_id
and filename (calledpath
) - Upload a tsv file with all the anonymized clip data
- Put clips into archives by language and upload it to (a different) S3
- Install node (>= 8.3.0)
- Install yarn
- Install CorporaCreator
- Install mp3-duration-sum
git clone git@github.com:Common-Voice/common-voice-bundler.git
- Override the keys defined in config.js with a
config.json
in the same dir yarn
yarn start