-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial add (with Git LFS for big data files)
- Loading branch information
0 parents
commit 914bd4e
Showing
12 changed files
with
722,463 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
Oersetter Models | ||
================= | ||
|
||
Maarten van Gompel | ||
Centre for Language and Speech Technology | ||
Radboud University Nijmegen | ||
Licensed under [OPEN LICENCE TO BE DETERMINED BY FRYSKE AKADEMY] | ||
|
||
This repository contains models for Oersetter, the Frisian-Dutch Machine | ||
Translation system developed by Radboud University Nijmegen in close | ||
collaboration with Fryske Akademy. This repository does not contain the | ||
literary sources to the models, as supplied by the Fryske Akademy, as those are | ||
copyrighted. It only contains derivative data from which the sources can not be | ||
reconstructed. | ||
|
||
It contains the following | ||
|
||
* ``nl-fy`` - *Dutch to Frisian* | ||
* ``moses.ini`` - Configuration for [Moses](http://www.statmt.org/moses/) with parameters optimized on a held-out development set using MERT. This file references all the others, please read the notices inside. | ||
* ``fy.lm`` - Language model (ARPA-style, generated with SRILM, should run also with KenLM supplied with Moses) | ||
* ``phrase-table.gz`` - The phrase-translation table (ARPA-style, generated with SRILM, should run also with KenLM supplied with Moses) | ||
* ``reordering-table.wbe-msd-bidirectional-fe.gz`` - Reordering table | ||
* ``fy-nl`` - *Frisian to Dutch* | ||
* ``moses.ini`` - Configuration for [Moses](http://www.statmt.org/moses/) with parameters optimized on a held-out development set using MERT. This file references all the others, please read the notices inside. | ||
* ``nl.lm.gz`` - Language model (ARPA-style, generated with SRILM, should run also with KenLM supplied with Moses), this is a big one trained on the frisian parallel corpora, OpenSubtitles and Europarl | ||
* ``nl.tiny.lm`` - A small language model trained only on the frisian parallel corpora and used during testing | ||
* ``phrase-table.gz`` - The phrase-translation table (ARPA-style, generated with SRILM, should run also with KenLM supplied with Moses) | ||
* ``reordering-table.wbe-msd-bidirectional-fe.gz`` - Reordering table | ||
|
||
This system is to be used with [Moses](http://www.statmt.org/moses/). A moses2 | ||
server can then be started as follows: | ||
|
||
``` | ||
moses2 -f moses.ini --server --server-port 2002 --mark-unknown --unknown-word-prefix "<em>" --unknown-word-suffix "</em>" | ||
``` | ||
|
||
A RESTful webservice wrapper that communicates with such a Moses server (and | ||
also provides a web-interface for users) is provided | ||
[separately](https://github.com/proycon/oersetter-webservice) and is powered by | ||
[CLAM](https://proycon.github.io/clam). | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
phrase-table.gz filter=lfs diff=lfs merge=lfs -text | ||
reordering-table.wbe-msd-bidirectional-fe.gz filter=lfs diff=lfs merge=lfs -text | ||
nl.lm.gz filter=lfs diff=lfs merge=lfs -text |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# MERT optimized configuration | ||
# decoder moses | ||
# BLEU 0.576313 on dev dev.fy | ||
# We were before running iteration 5 | ||
# finished Wed Aug 31 15:27:29 CEST 2016 | ||
### MOSES CONFIG FILE ### | ||
######################### | ||
|
||
# input factors | ||
[input-factors] | ||
0 | ||
|
||
# mapping steps | ||
[mapping] | ||
0 T 0 | ||
|
||
[distortion-limit] | ||
6 | ||
|
||
# feature functions | ||
# **NOTICE**: path= statements are relative, you may need to turn them into absolute paths in your environment! | ||
[feature] | ||
UnknownWordPenalty | ||
WordPenalty | ||
PhrasePenalty | ||
PhraseDictionaryMemory name=TranslationModel0 num-features=4 path=phrase-table.gz input-factor=0 output-factor=0 | ||
LexicalReordering name=LexicalReordering0 num-features=6 type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 path=reordering-table.wbe-msd-bidirectional-fe.gz | ||
Distortion | ||
KENLM name=LM0 factor=0 path=nl.lm.gz order=2 | ||
#NOTICE: ^-- If the above language model gives any trouble, gunzip if and adapt the path | ||
|
||
# dense weights for feature functions | ||
|
||
[threads] | ||
10 | ||
[weight] | ||
|
||
LexicalReordering0= 0.0934955 0.00140425 0.0376116 0.115159 0.0329388 0.0612791 | ||
Distortion0= 0.0701483 | ||
LM0= 0.108341 | ||
WordPenalty0= 0.128481 | ||
PhrasePenalty0= 0.102767 | ||
TranslationModel0= 0.149808 0.0126496 0.050318 0.035599 | ||
UnknownWordPenalty0= 1 |
Git LFS file not shown
Oops, something went wrong.