Releases · vngrs-ai/vnlp · GitHub

02 Mar 13:55

Python3.10 support and small fixes Latest

Latest

cyhunspell is replaced by spylls. Consequently, VNLP now supports Python 3.10. However, Python3.6 support is dropped now.
Newer versions of Tensorflow does not rely on Keras-Preprocessing anymore. This had caused issues since our tokenizers were saved via pickle. Instead, they are stored as json now, and are loaded in a tf version agnostic way.
Tensorflow warnings are suppressed.
Readthedocs build and files are updated due to tensorboard, protobuf and grpcio dependency issues.

Assets 2

15 Jun 17:53

SPUContext Models

SentencePiece Unigram Context (SPUContext) models are added for Named Entity Recognition, Dependency Parsing, Part of Speech Tagging and Sentiment Analysis. These are the default models now.
SPUContext models are even more compact, up to 4x faster and perform significantly better. See metrics table on the main page for comparison.
SPUContext models use SentencePiece Unigram tokenization.
Wheel file is 80% smaller now, and each model downloads its weights when it is initialized for the first time.
In order to evaluate a DL based model, use "evaluate = True" flag while initializing, e.g., NamedEntityRecognizer(model = 'CharNER', evaluate = True). This will load the weights that are NOT trained with test sets.
Former Python API has become a generic user API, creating an abstraction for the implemented methods. Desired model can be initialized using the "model" argument, e.g., NamedEntityRecognizer(model = 'CharNER').

Assets 2