Skip to content

heng840/event_extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bert-event-extraction

Pytorch Solution of Event Extraction Task using BERT on ACE 2005 corpus

Prerequisites

  1. Prepare ACE 2005 dataset.

  2. Use nlpcl-lab/ace2005-preprocessing to preprocess ACE 2005 dataset in the same format as the data/sample.json. Then place it in the data directory as follows:

    ├── data
    │     └── test.json
    │     └── dev.json
    │     └── train.json
    │...
    

    这是data的分布格式。

  3. change into ere dataset

     ├── data
     │     └── test.json
     │     └── dev.json
     │     └── train.json
     │...
    

    tokens:[] list形式,将sentence拆分为一维的list[] "tokens": ["Con", "respecto", "a", "la", "pregunta", "que", "se", "deben", "estar", "haciendo", "..."]

  4. setence与tokens可以对应 "sentence": "Con respecto a la pregunta que se deben estar haciendo..."

  5. entity_mentions 实体提及,是文本中指代实体(enetity)的词 实体:先列出来BIOES分别代表什么意思:

B,即Begin,表示开始

I,即Intermediate,表示中间

E,即End,表示结尾

S,即Single,表示单个字符

O,即Other,表示其他,用于标记无关字符 其中,PER代表人名, LOC代表位置, ORG代表组织. B-PER、I-PER代表人名首字、人名非首字, B-LOC、I-LOC代表地名(位置)首字、地名(位置)非首字,B-ORG、I-ORG代表组织机构名首字、组织机构名非首字,O代表该字不属于命名实体的一部分 [{"id": "c93832992e8ca0020c806137834bdd38-0-42-303", "start": 6, "end": 7, "entity_type": "PER", "mention_type": "PRO", "text": "se"}] 与ace对比: "golden-entity-mentions": [ { "text": "we", "entity-type": "ORG:Media", "head": { "text": "we", "start": 2, "end": 3 },

  1. Install the packages.
    pip install pytorch==1.0 pytorch_pretrained_bert==0.6.1 numpy
    

Usage

Train

python train.py

Evaluation

python eval.py --model_path=latest_model.pt

Result

Performance

Method Trigger Classification (%) Argument Classification (%)
Precision Recall F1 Precision Recall F1
JRNN 66.0 73.0 69.3 54.2 56.7 55.5
JMEE 76.3 71.3 73.7 66.8 54.9 60.3
This model (BERT base) 63.4 71.1 67.7 48.5 34.1 40.0

The performance of this model is low in argument classification even though pretrained BERT model was used. The model is currently being updated to improve the performance.

Reference

  • Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation (EMNLP 2018), Liu et al. [paper]
  • lx865712528's EMNLP2018-JMEE repository [github]
  • Kyubyong's bert_ner repository [github]

train.py

train(model, train_iter, optimizer, criterion)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages