Evaluation & Comparison:
Corpus: National Institute of Korean Language (ROK) - NER Corpus / 국립국어원 - 개체명 인식용 말뭉치 (Link)
Category | KoNER/코너 (2016) | Annie (2016) | KoreaNER | ||||||
Precision | Recall | F-Score | Precision | Recall | F-Score | Precision | Recall | F-Score | |
DT | 0.894 | 0.880 | 0.887 | 0.6373 | 0.7785 | 0.7009 | 0.94 | 0.94 | 0.94 |
LC | 0.793 | 0.853 | 0.822 | 0.5822 | 0.8782 | 0.7002 | 0.71 | 0.76 | 0.73 |
OG | 0.824 | 0.772 | 0.797 | 0.7624 | 0.7087 | 0.7346 | 0.73 | 0.63 | 0.68 |
PS | 0.915 | 0.885 | 0.899 | 0.8834 | 0.6127 | 0.7236 | 0.80 | 0.75 | 0.78 |
TI | 0.872 | 0.810 | 0.840 | 0.5441 | 0.8810 | 0.6727 | 0.98 | 0.89 | 0.93 |
Future improvements:
- Add Gazeteer
- Add specific features for PS/LOC
- Web API
References:
Character-Aware Neural Language Models
Boosting Named Entity Recognition with Neural Character Embeddings
Attending To Characters In Neural Sequence Labeling Models
Neural Architectures for Named Entity Recognition
Bidirectional LSTM-CRF Models for Sequence Tagging
Character-level Convolutional Networks for Text Classification
A Syllable-based Technique for Word Embeddings of Korean Words
Open source projects (Github):