This framework provides python scripts to train neural networks for speech recognition.
- Python 3.6+
Install the required python dependencies listed in the requirements.txt
:
pip install -r requirements.txt
To run a training, training data is required. A Training accepts a file, which contains metadata about the training data. The file itself is a JSON file consisting of an array. Each element has the following properties:
path
: Absolute path to the audio filetext
: Transcription of the audio file
A training data file could look like this:
[
{
"path": "path/to/audio/file1.wav",
"text": "hello world"
},
{
"path": "path/to/audio/file2.wav",
"text": "goodbye world"
}
]
You can find a downloader for the voxforge corpus at https://github.com/KevNetG/speech-to-text-voxforge.
This repo also includes a generator.py
file, which creates a training data file containing the required metadata for trainings.
Currently, only WAVE files are supported.
The idea is, that you don't write you training configuration into the command line,
but instead into a file, which you can modify and reuse for other trainings.
You can find a sample configuration under examples/training.config.json
and adjust it to your needs.
A configuration file has the following properties:
epochs
: Number of epochs to trainbatchSize
: Batch SizetrainingDataQuantity
: Amount of training data that is taken from the provided sourcesnet
: Name of the model. Models are specified in themodels.py
filetrainingData
: Absolute Path to a training data file. You can specify multiple sources to simply scale your amount of available training dataalphabetPath
: Path to an alphabet file
{
"epochs": 10,
"batchSize": 20,
"trainingDataQuantity": 50000,
"net": "graves",
"trainingData": [
"speech-to-text/training_data.json"
],
"alphabetPath": "speech-to-text/examples/english.json"
}
You can use the english alphabet available under examples/english.json
or create one yourself for any other language.
The alphabet file is a simple JSON file consisting of an array containing the characters from the alphabet:
To run a training execute the train.py
and provide two arguments:
path
: Where to store the training. You don't have to specify a file extensionplan
: The path to a training configuration
like this:
python train.py "trainings/graves" "training_data.json"
Trainings are saved after each epoch.
If you had to stop a training prematurely, you can continue it from the last checkpoint.
Simply execute the continuetraining.py
and pass the path to the training save file.
You don't have to specify the file extension:
python continuetraining.py "trainings/graves.json"
In order to create a transcription of an audio file, use the predict.py
script.
Pass the following arguments:
path
: Path to a training save fileweights
, Path to a weights matrixaudio
: Path to the audio file which shall be transcribed
For example:
python predict.py "trainings/graves.json" "trainings/graves.weights-20-65.68075.h5" "media/audio.wav"
If you want to display the training loss and the validation loss of a training, execute the statistics.py
script:
python statistics.py "trainings/graves.statistics.json"