Releases: taishi-i/toiro
Releases · taishi-i/toiro
toiro 0.0.9
toiro 0.0.8
toiro 0.0.8 incorporates the following changes:
- add chABSA_dataset to download_corpus method
datadownloader.download_corpus('chABSA_dataset')
train_df, dev_df, test_df = datadownloader.load_corpus('chABSA_dataset')
- add Python3.8 to travis and GitHub Actions
- fix preprocess.py and test_datadownloader.py
toiro 0.0.7
toiro 0.0.7 incorporates the following changes:
- add three tokenizers (fugashi-ipadic, fugashi-unidic, tinysegmenter) to toiro
- add additional_tokenizers to compare
tokenizers.compare(filename, additional_tokenizers)
- add sample codes and slides in PyCon JP 2020
- add python-package.yml to .github/workflows
- fix toiro._version_
toiro 0.0.6
toiro 0.0.6 incorporates the following changes:
- fix a generator error in tokenizer_janome.py due to an update of janome v0.4.0 e2b3e73
- fix a failure Build and publish v0.0.5
- add 05_svm_vs_bert_benchmarking_application_tasks_ja.ipynb to examples
toiro 0.0.4
toiro 0.0.4 incorporates the following changes:
- add disable_tokenizers function to
tokenizers.compare
- fix a bug in the initial release.
- fix error for a long input text in Jumanpp
- add 01_getting_started_ja.ipynb to README.md
toiro 0.0.3
toiro 0.0.3 incorporates the following changes:
- fix a bug in the initial release.
- fix typo: SVMClassifitionModel to SVMClassificationModel
- fix docker example in README.md
toiro 0.0.2
This is the first release of this library.
Toiro is a comparison tool of Japanese tokenizers.
- Compare the processing speed of tokenizers
- Compare the words segmented in tokenizers
- Compare the performance of tokenizers by benchmarking application tasks (e.g., text classification)
It also provides useful functions for natural language processing in Japanese.
- Data downloader for Japanese text corpora
- Preprocessor of these corpora
- Text classifier for Japanese text (e.g., SVM, BERT)