A Python port of the Jiten's Parser library (commit: 2e3588f8) for Japanese text segmentation.
Requires Python 3.8+.
pip install git+https://github.com/lmg-anon/jiten-parser-py.git
python -m jiten.setup_depsfrom jiten.parser import Parser
from jiten.jmdict.jmdict import JmDict
jmdict = JmDict()
text = "美少女がアニメを見ている。"
parsed_words = Parser.parse_text(text)
for word in parsed_words:
entry = jmdict.get_word_by_id(word.word_id)
if entry:
dictionary_form = entry.readings[word.reading_index]
meanings = entry.definitions[0].english_meanings if entry.definitions else []
print(f"'{word.original_text}' -> {dictionary_form} | {meanings[:2]}")Output:
'美少女' -> 美少女 | ['beautiful girl']
'が' -> が | ['indicates the subject of a sentence']
'アニメ' -> アニメ | ['animation', 'animated film']
'を' -> を | ['indicates direct object of action']
'見ている' -> 見る | ['to see', 'to look']
The repository includes a simple GUI that mimics Jiten's website.
To run it, first install the additional dependencies:
pip install "jiten-parser[gui] @ git+https://github.com/lmg-anon/jiten-parser-py.git"Then run using this command:
python -m jiten.app.guiThis project is licensed under the Apacha-2.0 License.
This project is a port and utilizes resources from the following projects:
- Jiten's Parser by Sirush: The original C# library, deconjugation rules, and data resources.
- Sudachi: The Japanese morphological analyzer.
- EDRDG: The JMDict and JMnedict dictionary files, used in conformance with the Group's licence.