Repository for the Statistical Learning course projec: Spoken Digit Recognition with Machine learning methods
A simple audio/speech dataset consisting of recordings of spoken digits in wav
files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.
FSDD is an open dataset, which means it will grow overtime as data is contributed. Thus in order to enable reproducibility and accurate citation in scientific journals the dataset is versioned using git tags
.
The Notebook.ipynb
consists of:
- Phase_1: Preprocessing
- Phase_2a: Supervised Learning without extracting features
- Phase_2b: Supervised Learning with extracting features
- Phase_3: Unsupervised Learning