Bunkai has two algorithms: tsunoda
and bunkai
.
An implementation described in 角田孝昭, "顧客レビューテキスト解析に基づく文書作成支援に関する研究", 筑波大学博士論文 (2016).
It exploits following annotators.
FaceMarkDetector
- detects a character span of face-marks. This is rule-based. The detected spans are excepted from SB candidates.
EmotionExpressionAnnotator
- detects a character span of emotional expressions such as
(笑)
. This is rule-based. The detected spans are excepted from SB candidates.
- detects a character span of emotional expressions such as
BasicRule
- detects SB candidates based on rules.
MorphAnnotatorJanome
- runs a morphological analyzer on the input text.
EmojiAnnotator
- detects a character span of Emoji characters. This module is rule-based. With default rule, Emoji in
Smileys & Emotion
,Symbols
categories are SB candidates. For getting to know Emoji categories, see this page
- detects a character span of Emoji characters. This module is rule-based. With default rule, Emoji in
IndirectQuoteExceptionAnnotator
- detects spans of indirect quotations that do not have explicit quotation marks
「」
. SBs within indirect quotations are exceptional. This process is rule-based using morphological information.
- detects spans of indirect quotations that do not have explicit quotation marks
DotExpressionExceptionAnnotatorc
- detects SB
.
characters between numbers such as1.2畳
. The detected characters are SBs.
- detects SB
NumberExceptionAnnotator
- detects SB
.
characters between idiomatic expressions such asおすすめ度No.1
. The detected characters are SBs.
- detects SB
- For line breaks
LinebreakForceAnnotator
(When-m
option is not given)- deetect SB for all line breaks
LinebreakExceptionAnnotator
(When-m
option is given)- classifies line breaks whether they are SB or not