Central repository with pretrained models for transfer learning, BPE subword-tokenization, mono/multilingual embeddings, and everything in between.
-
Updated
Oct 15, 2022 - Python
Central repository with pretrained models for transfer learning, BPE subword-tokenization, mono/multilingual embeddings, and everything in between.
The concept of DAWGs is based on: Blumer, A. et al. (1985). The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 40, 31–55.
Repository for the experiments in my paper: "A Systematic Analysis of Vocabulary and BPE Settings for Optimal Fine-tuning of NMT: A Case Study of In-domain Translation "
ICEBERT: Interlingual-Clusters Enhanced BERT. A BERT-like model trained on clusters of monolingual subwords.
Morfessor is a tool for unsupervised and semi-supervised morphological segmentation
Morfessor EM+Prune
Morfessor EM+Prune
Cognate-aware morphological segmentation
Morfessor FlatCat
Parsing and subword segmentation code for the VML-HD Dataset
Morfessor demonstration
Add a description, image, and links to the subword-segmentation topic page so that developers can more easily learn about it.
To associate your repository with the subword-segmentation topic, visit your repo's landing page and select "manage topics."