Releases: vngrs-ai/vnlp
Releases · vngrs-ai/vnlp
Python3.10 support and small fixes
- cyhunspell is replaced by spylls. Consequently, VNLP now supports Python 3.10. However, Python3.6 support is dropped now.
- Newer versions of Tensorflow does not rely on Keras-Preprocessing anymore. This had caused issues since our tokenizers were saved via pickle. Instead, they are stored as json now, and are loaded in a tf version agnostic way.
- Tensorflow warnings are suppressed.
- Readthedocs build and files are updated due to tensorboard, protobuf and grpcio dependency issues.
SPUContext Models
- SentencePiece Unigram Context (SPUContext) models are added for Named Entity Recognition, Dependency Parsing, Part of Speech Tagging and Sentiment Analysis. These are the default models now.
- SPUContext models are even more compact, up to 4x faster and perform significantly better. See metrics table on the main page for comparison.
- SPUContext models use SentencePiece Unigram tokenization.
- Wheel file is 80% smaller now, and each model downloads its weights when it is initialized for the first time.
- In order to evaluate a DL based model, use "evaluate = True" flag while initializing, e.g., NamedEntityRecognizer(model = 'CharNER', evaluate = True). This will load the weights that are NOT trained with test sets.
- Former Python API has become a generic user API, creating an abstraction for the implemented methods. Desired model can be initialized using the "model" argument, e.g., NamedEntityRecognizer(model = 'CharNER').