A minimalistic toolkit with recipes for the preprocessing and handling of audio and text data oriented to Text-to-Speech modeling. Recipes can be found in python or bash files depending on how convenient each can be.
Note: This is work in progress. I tried to make recipes generic but some parts are specific to my project - I will improve these by adapting them to open source tools.
- Resample audio (SoX)
- PCM modification (by changing bit-depth in this case) (SoX)
- Audio split in miliseconds (Pydub)
- Audio split in seconds (Wave)
- Change text encoding
- Text cleanup
- Join text files in single metadata.txt
- Text split in punctuation marks
- Transcribe metadata phonetically (save as numpy or regular phonemes)
To be added.
Just clone this repo
git clone https://github.com/annemnvz/TTS-toolkit.git
cd TTS-toolkit
And run the recipe you need to use.
Remember to modify paths and files or adjust to other tools if needed.