- Fix signature of
tasks.tokenize_and_postag
function - Update
tox.ini
to include newer python version, as well as older parameters and flags - Reformat und Lint
- Fix tokenization / tokenization + POS tagging: return words instead of subwords
- Add
--escape-special
and--subwords
parameter to CLI script for tokenization. Allows tokenization to further tokenize unknown words (e. g. names) as well as escape special characters with angle bracket entities.
- Add demo webapp with sentence segmentation. (NOTE: Running both the webapp and (batch) sentence segmentation at the same time from the same installation is not recommeded. It can have unexpected side-effects.)
- Some reformat of
README.rst
- Fix duplicate names (class/method for
sentence_segment
), rename class tosentence_segmenter
(.py
).
- Add
twine
to extras dependencies. - Publish module on PyPI. (Only
sdist
,bdist_wheel
can't be built currently.) - Fix some TravisCI warnings.
- Add tasks to
__init__.py
for easier access.
- Refactor tasks into
tasks.py
to enable better import in case of embedding thai-segmenter into other projects. - Have it almost release ready. :-)
- Add some more parameters to functions (optional header detection function)
- Flesh out
README.rst
with examples and descriptions. - Add Changelog items.
- Many changes,
bumpversion
needs to run where.bumpversion.cfg
is located else it silently fails ... - Strip Typehints and add support for Python3.5 again.
- Add CLI tasks for cleaning, sentseg, tokenize, pos-tagging.
- Add various params, e. g. for selecting columns, skipping headers.
- Fix many bugs for TravisCI (isort, flake8)
- Use iterators / streaming approach for file input/output.
- Remove support of Python 2.7 and lower equal to Python 3.5 because of Typehints.
- Added CLI skeleton.
- Add really good
setup.py
. (withblack
,flake8
)
- First release version as package.