Code used in conjunction with an implementation of a Seq2Seq LSTM TTS frontend, to process and evaluate Google Research's Wikipedia Homograph Dataset (WHD) and LibriSpeech data, with the aim of improving the TTS frontend's homograph disambiguation abilities.
The data was processed to add supplementary POS tags (from Festival and SpaCy) as input to the model on a per-character basis, and also as part of a MultiTask Learning paradigm. For this, the WHD was also cleaned so that it could be entered to Festival without Out-of-Dictionary words.
Data was in the form:
with added POS tags: VBD VBD VDB VDB VDB VDB # VBP VBP VBP # DT DT DT # JJ JJ JJ JJ JJ JJ # NN NN NN NN NN NN #