pykakasi
is a Python Natural Language Processing (NLP) library to transliterate hiragana, katakana and kanji (Japanese text) into rōmaji (Latin/Roman alphabet).
It is based on the kakasi library, which is written in C.
- Install (from PyPI):
pip install pykakasi
- Documentation available on readthedocs
Transliterate Japanese text to rōmaji:
>>> import pykakasi
>>>
>>> text = u"かな漢字交じり文"
>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("H","a") # Hiragana to ascii, default: no conversion
>>> kakasi.setMode("K","a") # Katakana to ascii, default: no conversion
>>> kakasi.setMode("J","a") # Japanese to ascii, default: no conversion
>>> kakasi.setMode("r","Hepburn") # default: use Hepburn Roman table
>>> kakasi.setMode("s", True) # add space, default: no separator
>>> kakasi.setMode("C", True) # capitalize, default: no capitalize
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
kana Kanji Majiri Bun
Tokenize Japanese text (split by word boundaries), equivalent to kakasi
's wakati gaki option:
>>> wakati = pykakasi.wakati()
>>> conv = wakati.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな 漢字 交じり 文
Add furigana (pronounciation aid) in rōmaji to text:
>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("J","aF") # Japanese to furigana
>>> kakasi.setMode("H","aF") # Japanese to furigana
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな[kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]
Input mode values: "J" (Japanese: kanji, hiragana and katakana), "H" (hiragana), "K" (katakana).
Output mode values: "H" (hiragana), "K" (katakana), "a" (alphabet / rōmaji), "aF" (furigana in rōmaji).
There are other setMode
switches which control output:
- "r": Romanisation table: Hepburn (default), Kunrei or
Passport
- "s": Separator:
False
adds no spaces between words (default),True
adds spaces between words - "C": Capitalize:
False
adds no capital letters (default),True
makes each word start with a capital letter
Copyright 2010-2019 Hiroshi Miura <miurahr@linux.com>
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.