Detect a set of predefined keywords in an audioclip.
The End-to-End framework for the KeyWordSpotting task is made of a sliding window of 1 second, a Voice Activity Detection module or a Silence Filter that select onfly the frames containig human voice, from those frames a feature extraction module will extract the Mel Spectogram or the Mel Cepstral Coefficients, this will be the input of the model. Finally a fusion rule aggregates all frames pedictions in a single one.
.......
If you just want to download and use the best model in your application you need to...
import librosa
import numpy as np
import Models #Our models
import LoadAndPreprocessDataset
from tensorflow.keras.models import load_model
.....
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Stefano Ivancich | Luca Masiero |
---|---|
github.com/ivaste |
github.com/TyllanDrake |