Skip to content

lmg-anon/jiten-parser-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jiten-Parser-Py

A Python port of the Jiten's Parser library (commit: 2e3588f8) for Japanese text segmentation.

Installation

Requires Python 3.8+.

pip install git+https://github.com/lmg-anon/jiten-parser-py.git
python -m jiten.setup_deps

Usage

from jiten.parser import Parser
from jiten.jmdict.jmdict import JmDict

jmdict = JmDict()

text = "美少女がアニメを見ている。"
parsed_words = Parser.parse_text(text)

for word in parsed_words:
    entry = jmdict.get_word_by_id(word.word_id)
    if entry:
        dictionary_form = entry.readings[word.reading_index]
        meanings = entry.definitions[0].english_meanings if entry.definitions else []
        print(f"'{word.original_text}' -> {dictionary_form} | {meanings[:2]}")

Output:

'美少女' -> 美少女 | ['beautiful girl']
'が' -> が | ['indicates the subject of a sentence']
'アニメ' -> アニメ | ['animation', 'animated film']
'を' -> を | ['indicates direct object of action']
'見ている' -> 見る | ['to see', 'to look']

Interactive GUI Example

The repository includes a simple GUI that mimics Jiten's website.

image

To run it, first install the additional dependencies:

pip install "jiten-parser[gui] @ git+https://github.com/lmg-anon/jiten-parser-py.git"

Then run using this command:

python -m jiten.app.gui

License & Acknowledgements

This project is licensed under the Apacha-2.0 License.

This project is a port and utilizes resources from the following projects:

  • Jiten's Parser by Sirush: The original C# library, deconjugation rules, and data resources.
  • Sudachi: The Japanese morphological analyzer.
  • EDRDG: The JMDict and JMnedict dictionary files, used in conformance with the Group's licence.

About

Japanese text segmentation library

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages