These data come from OPUS (http://opus.nlpl.eu/).
There are data from 5 domains:
Law (JRC-Acquis), Medical (EMEA), IT (GNOME, KDE, PHP, Ubuntu, and OpenOffice), Koran (Tanzil), and Subtitles (OpenSubtitles).
@InProceedings{TIEDEMANN12.463,
author = {J\"org Tiedemann},
title = {Parallel Data, Tools and Interfaces in OPUS},
booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
year = {2012},
month = {may},
date = {23-25},
address = {Istanbul, Turkey},
editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Ugur Dogan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis},
publisher = {European Language Resources Association (ELRA)},
isbn = {978-2-9517408-7-7},
language = {english}
}
@InCollection{Tiedemann:RANLP5,
author = {J\"org Tiedemann},
title = {News from {OPUS} - {A} Collection of Multilingual
Parallel Corpora with Tools and Interfaces},
booktitle = {Recent Advances in Natural Language Processing},
publisher = {John Benjamins, Amsterdam/Philadelphia},
year = 2009,
editor = {N. Nicolov and K. Bontcheva and G. Angelova and
R. Mitkov},
volume = {V},
address = {Borovets, Bulgaria},
isbn = {978 90 272 4825 1},
pdf = {http://stp.lingfil.uu.se/~joerg/published/ranlp-V.pdf},
topic = {Parallel corpora}
}
Source: https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis Downloaded from: http://opus.nlpl.eu/JRC-Acquis.php
Source: http://www.emea.europa.eu/ Downloaded from: http://opus.nlpl.eu/EMEA.php
Source: https://l10n.gnome.org Downloaded from: http://opus.nlpl.eu/GNOME.php
Downloaded from: http://opus.nlpl.eu/KDE4.php
Source: http://se.php.net/download-docs.php Downloaded from: http://opus.nlpl.eu/PHP.php
Source: https://translations.launchpad.net Downloaded from: http://opus.nlpl.eu/Ubuntu.php
Source: http://www.openoffice.org/ Downloaded from: http://opus.nlpl.eu/OpenOffice.php
Source: http://tanzil.net/ Downloaded from: http://opus.nlpl.eu/Tanzil.php
Source: http://www.opensubtitles.org/ Downloaded from: http://opus.nlpl.eu/OpenSubtitles2016.php
Six Challenges for Neural Machine Translation
@InProceedings{koehn-knowles:2017:NMT,
author = {Koehn, Philipp and Knowles, Rebecca},
title = {Six Challenges for Neural Machine Translation},
booktitle = {Proceedings of the First Workshop on Neural Machine Translation},
month = {August},
year = {2017},
address = {Vancouver},
publisher = {Association for Computational Linguistics},
pages = {28--39},
url = {http://www.aclweb.org/anthology/W17-3204}
}
Medical (EMEA), IT (GNOME, KDE, PHP, Ubuntu, and OpenOffice), Koran (Tanzil), and Subtitles (OpenSubtitles) were used in:
Neural Lattice Search for Domain Adaptation in Machine Translation
@InProceedings{I17-2004,
author = {Khayrallah, Huda
and Kumar, Gaurav
and Duh, Kevin
and Post, Matt
and Koehn, Philipp},
title = {Neural Lattice Search for Domain Adaptation in Machine Translation},
booktitle = {Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
year = {2017},
publisher = {Asian Federation of Natural Language Processing},
pages = {20--25},
location = {Taipei, Taiwan},
url = {http://aclweb.org/anthology/I17-2004}
}