Version 2.1.0
This version includes the following improvements:
- New Annotator: MongoStemming uses a gazetteer and stemming to perform
a pseudo-fuzzy match and find gazetter terms in different tenses and
plurals - New Cleaner: MergeAdjacent will merge adjacent entities of the same
type - New Content Extractor: CsvContentExtractor splits CSV fields into
content and metadata - New Collection Reader: LineReader will read a single file into
multiple documents by line - New REST API to get configuration parameters for components (e.g.
annotators) - Significant changes to the way gazetteer annotators work, including
changing from RadixTrees to MultiMaps and implementing the Aho-Corasick
algorithm, resulting in performance improvements for large gazetteers in
the order of 100s - Lots of bug fixes and minor improvements