This is a tutorial of training and evaluating a transformer wait-k simultaneous model on MUST-C English-Germen Dataset, from SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation.
MuST-C is multilingual speech-to-text translation corpus with 8-language translations on English TED talks.
See data preparation
The training script for asr is in exp/1a-pretrain_asr.sh.
ASR model with Emformer encoder and Transformer decoder. Pre-trained with joint CTC cross-entropy loss.
MuST-C (WER) | en-de (V2) | en-es |
---|---|---|
dev | 9.65 | 14.44 |
tst-COMMON | 12.85 | 14.02 |
model | download | download |
vocab | download | download |
The training script for offline waitk is in exp/4-offline_waitk.sh.
The waitk model will be trained as an offline (wait-1024) model, and tested as a wait-1 model.
bash 4-offline_waitk.sh
The evaluation instruction is in simuleval_instruction.md. The wait-k uses the default_agent.py.
{
"Quality": {
"BLEU": 20.258749351223564
},
"Latency": {
"AL": 1782.001343711587,
"AL_CA": 1935.7023338036943,
"AP": 0.7822591501150944,
"AP_CA": 0.8479015672001843,
"DAL": 2244.2804247360823,
"DAL_CA": 2492.808483191793
}
}