Releases · taishi-i/toiro

31 Jul 15:07

taishi-i

0.0.9

98c8715

toiro 0.0.9 Latest

Latest

toiro 0.0.9 incorporates the following changes:

fix a scikit-learn installation error
fix wheels in PyPI
update README.md

Assets 2

02 Nov 18:14

taishi-i

0.0.8

9d37c39

toiro 0.0.8

toiro 0.0.8 incorporates the following changes:

add chABSA_dataset to download_corpus method
datadownloader.download_corpus('chABSA_dataset')
train_df, dev_df, test_df = datadownloader.load_corpus('chABSA_dataset')
add Python3.8 to travis and GitHub Actions
fix preprocess.py and test_datadownloader.py

Assets 2

08 Sep 08:32

taishi-i

0.0.7

30327c4

toiro 0.0.7

toiro 0.0.7 incorporates the following changes:

add three tokenizers (fugashi-ipadic, fugashi-unidic, tinysegmenter) to toiro
add additional_tokenizers to compare
tokenizers.compare(filename, additional_tokenizers)
add sample codes and slides in PyCon JP 2020
add python-package.yml to .github/workflows
fix toiro._version_

Assets 2

23 Aug 10:31

taishi-i

0.0.6

285c4eb

toiro 0.0.6

toiro 0.0.6 incorporates the following changes:

fix a generator error in tokenizer_janome.py due to an update of janome v0.4.0 e2b3e73
fix a failure Build and publish v0.0.5
add 05_svm_vs_bert_benchmarking_application_tasks_ja.ipynb to examples

Assets 2

16 Aug 16:04

taishi-i

0.0.4

e66d997

toiro 0.0.4

toiro 0.0.4 incorporates the following changes:

add disable_tokenizers function to tokenizers.compare
fix a bug in the initial release.
fix error for a long input text in Jumanpp
add 01_getting_started_ja.ipynb to README.md

Assets 2

14 Aug 13:15

taishi-i

0.0.3

abdfa6e

toiro 0.0.3

toiro 0.0.3 incorporates the following changes:

fix a bug in the initial release.
fix typo: SVMClassifitionModel to SVMClassificationModel
fix docker example in README.md

Assets 2

13 Aug 14:22

taishi-i

0.0.2

1129d2d

toiro 0.0.2

This is the first release of this library.

Toiro is a comparison tool of Japanese tokenizers.

Compare the processing speed of tokenizers
Compare the words segmented in tokenizers
Compare the performance of tokenizers by benchmarking application tasks (e.g., text classification)

It also provides useful functions for natural language processing in Japanese.

Data downloader for Japanese text corpora
Preprocessor of these corpora
Text classifier for Japanese text (e.g., SVM, BERT)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: taishi-i/toiro

toiro 0.0.9

toiro 0.0.8

toiro 0.0.7

toiro 0.0.6

toiro 0.0.4

toiro 0.0.3

toiro 0.0.2