Skip to content

Latest commit

 

History

History
17 lines (13 loc) · 591 Bytes

README.md

File metadata and controls

17 lines (13 loc) · 591 Bytes

fastbpe

Java implementation of Neural Machine Translation of Rare Words with Subword Units

fastbpe implements byte pair encoding tokenization in java. The library is supposed to be

  • simple,
  • fast,
  • handle large volumes of text - minimum 1 gb,
  • flawless with tests,
  • ready for alpha testing soon.

References,