Releases: chozelinek/europarl
Releases · chozelinek/europarl
Initial release
This as an implementation of the following pipeline:
- download of MEPs information in HTML
- MEPs information extraction
- download of EP proceedings in HTML
- transformation of HTML into XML
- filtering out text untranslated
- add MEPs' metadata
- sentence splitting
- tokenization, lemmatization, PoS tagging
- extraction of originals, all translations, translations from a particular source language