This source code is a demo which extracts phrase pairs from given corpus. Corpuses are referenced from Taetoba, "https://tatoeba.org/eng/". For tokenization, tokenizer of mosesdecoder, "https://github.com/moses-smt/mosesdecoder/tree/master/scripts/tokenizer", and for aligning words, Chris Dyer's fast_align "https://github.com/clab/fast_align" is used.
-
Notifications
You must be signed in to change notification settings - Fork 0
saitarslanboun/demo_phrase_extraction
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published