Yet another sentence-level tokenizer for the Japanese text
-
Updated
Sep 27, 2022 - Python
Yet another sentence-level tokenizer for the Japanese text
This project aims to classify Japanese sentence to how well similar to some Japanese classical writers, such as Soseki Natsume, Ogai Mori, Ryunosuke Akutagawa and so on.
A small experiment using both Mecab and Tinysegmenter to create a tokenized list of Japanese sentences in JSON, taken from the Tatoeba corpus.
Add a description, image, and links to the japanese-sentences topic page so that developers can more easily learn about it.
To associate your repository with the japanese-sentences topic, visit your repo's landing page and select "manage topics."