All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fixed bug causing incorrect normalization when replacement tokens can be tokenized themselves
- Improved logical reasoning to resolve conflicting instructions
- Updated logic with respect to spelling correction
- Support for transitivity in tokenization rules
- Option to identify tokens but don't add word separators to resulting string
- Tokenization rules can be added to a compiled model
- Implicit instantiation of core classes
- Classes and function for ad hoc creation of tokenization config
- Methods to save (pickle) and load (unpickle) compiled Normalizer instance
- Wheel for Python 3.9
- Fixed bug with replacing substring that is not a token
- Normalizer.data is now exposed as a property
- Updated documentation, added performance benchmarks
- Installable package is either pure Python or wheel with precompiled Cython
- Normalizer.result['r_map'] attribute
- Scripts to build wheels
- Normalizer.data attribute is now exposed and can be accessed directly
- Added README.md in released package
- Module is cythonized at the time of installation
- Configurable string normalization module