Japanese-Vietnamese parallel data is collected from TED talks extracted from WIT3’s corpus. After removing blank and duplicate lines, there are 106758 pairs of sentences. The validation set used in all experiments is dev2010 and the test set is tst2010
Vietnamese - Japanese
Model | BLEU | Method | Reference | Code |
---|---|---|---|---|
NMT + JPBPE + VNBPE | 11.13 | Ngo et al. KSE'18 | ||
NMT Baseline | 9.39 | Ngo et al. KSE'18 | ||
SMT Baseline | 8.73 | Ngo et al. KSE'18 |
Japanese - Vietnamese
Model | BLEU | Method | Reference | Code |
---|---|---|---|---|
NMT + JPBPE + VNBPE + Back Translation + Mix-Source | 9.64 | Ngo et al. KSE'18 | ||
NMT Baseline | 8.18 | Ngo et al. KSE'18 | ||
SMT Baseline | 7.73 | Ngo et al. KSE'18 |
IWSLT 2015: The IWSLT 2015 Evaluation Campaign featured three tracks: automatic speech recognition (ASR), spoken language translation (SLT), and machine translation (MT). For ASR we offered two tasks, on English and German, while for SLT and MT a number of tasks were proposed, involving English, German, French, Chinese, Czech, Thai, and Vietnamese.
TED Data En-Vi
: 131k sentences (train), 1080 sentences (tst2015)
TED: MT English-Vietnamese
Method | External Training Data | BLEU | NIST | TER | Paper/Source | Code |
---|---|---|---|---|---|---|
Tall Transformer with Style-Augmented Training | ✓ | 43.37 | Chinh et al. '21 | vietai/SAT | ||
PJAIT | ✕ | 28.39 | 6.6650 | 56.01 | Wolk et al. IWSLT'15 | |
JAIST | ✕ | 28.17 | 6.7092 | 55.84 | Trieu et al. IWSLT'15 | |
KIT | ✕ | 26.60 | 6.4014 | 58.26 | Ha et al. IWSLT'15 | |
SU | ✕ | 26.41 | 6.5986 | 55.60 | Luong et al. IWSLT'15 | |
UNETI | ✕ | 22.93 | 6.0218 | 60.33 | Tran et al. IWSLT'15 | |
BASELINE | ✕ | 27.01 | 6.4716 | 58.42 | Cettolo et al. IWSLT'15 |
More Information
TED: MT Vietnamese-English
Method | BLEU | NIST | TER | Year |
---|---|---|---|---|
PJAIT | 23.46 | 5.7314 | 62.20 | 2015 |
UMD | 21.57 | 5.7831 | 59.19 | 2015 |
JAIST | 21.53 | 5.6413 | 62.35 | 2015 |
UNETI | 20.18 | 5.1443 | 66.33 | 2015 |
TUT | 19.78 | 5.4559 | 62.69 | 2015 |
BASELINE | 24.61 | 5.9259 | 59.32 | 2015 |
References
Task Description
The IWSLT 2015 Evaluation Campaign (2015), M. Cettolo et al. [pdf]UNETI '15
The English-Vietnamese Machine Translation System for IWSLT 2015 (2015), H. Tran et al. [link]PJAIT '15
PJAIT Systems for the IWSLT 2015 Evaluation Campaign Enhanced by Comparable Corpora (2015), K. Wolk et al. [pdf]TUD '15
Improvement of Word Alignment Models for Vietnamese-to-English Translation (2015), A. Axelrod et al. [pdf]UMD '15
The UMD Machine Translation Systems at IWSLT 2015 (2015), T. Nomura et al. [pdf]KIT '15
The KIT Translation Systems for IWSLT 2015 (2015), T. Ha et al. [pdf]JAIST '15
UET '15
The JAIST-UET-MITI Machine Translation Systems for IWSLT 2015 (2015), H. Trieu et al. [pdf]SU '15
Stanford Neural Machine Translation Systems for Spoken Language Domains (2015), M. Luong et al. [pdf]
📁 Open sources
- duyvuleo/Transformer-DyNet (2018-)
dynet,python
- polyglot (2014-2017)
c++,java,python
- EVBCorpus (2016)
data