Algorithms

Bunkai has two algorithms: tsunoda and bunkai.

tsunoda

FaceMarkDetector
- detects a character span of face-marks. This is rule-based. The detected spans are excepted from SB candidates.
EmotionExpressionAnnotator
- detects a character span of emotional expressions such as （笑）. This is rule-based. The detected spans are excepted from SB candidates.
BasicRule
- detects SB candidates based on rules.
MorphAnnotatorJanome
- runs a morphological analyzer on the input text.
EmojiAnnotator
- detects a character span of Emoji characters. This module is rule-based. With default rule, Emoji in Smileys & Emotion, Symbols categories are SB candidates. For getting to know Emoji categories, see this page
IndirectQuoteExceptionAnnotator
- detects spans of indirect quotations that do not have explicit quotation marks 「」. SBs within indirect quotations are exceptional. This process is rule-based using morphological information.
DotExpressionExceptionAnnotatorc
- detects SB . characters between numbers such as 1.2畳. The detected characters are SBs.
NumberExceptionAnnotator
- detects SB . characters between idiomatic expressions such as おすすめ度No.1. The detected characters are SBs.
For line breaks
- LinebreakForceAnnotator (When -m option is not given)
  - deetect SB for all line breaks
- LinebreakExceptionAnnotator (When -m option is given)
  - classifies line breaks whether they are SB or not