Skip to content

imperial-pronunciation-app/model-experiments

Repository files navigation

Model

Methodology

We are fine-tuning Whisper for audio to phonemes transcription trained using the TIMIT dataset.

As we want the model to output phonemes instead of text, we need to construct a custom tokeniser. We can reuse the pre-trained feature extractor. We take the pre-trained tiny.en model, freezing the encoder layers and fine-tuning the decoder layers, changing the output layer dimension for phonemes.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published