Free the Plural: Unrestricted Split-Antecedent Anaphora Resolution

Introduction

This repository contains code introduced in the following paper:

Free the Plural: Unrestricted Split-Antecedent Anaphora Resolution
Juntao Yu, Nafise Sadat Moosavi, Silviu Paun and Massimo Poesio
In Proceedings of he 28th International Conference on Computational Linguistics (COLING), 2020

The code is written in Python 2, the compatibility to Python 3 is not guaranteed.
Before starting, you need to install all the required packages listed in the requirment.txt using pip install -r requirements.txt.
After that modify and run extract_bert_features/extract_bert_features.sh to compute the BERT embeddings for your training or testing.
You also need to download context-independent word embeddings such as fasttext or GloVe embeddings that required by the system.

Pre-trained models can be download from this link. We provide the best model for reported in our paper.
Choose the model you want to use and copy them to the logs/ folder.
Modifiy the test_path accordingly in the experiments.conf:
- the test_path is the path to .jsonlines file, each line of the .jsonlines file is a batch of sentences and must in the following format:
```
{
"clusters": [[0, 4],[1],[2],[3],], #Coreference use the indices of the mention
"mentions": [[0,0],[2,3],[5,5],[7,8],[10,10],[12,13]], #mentions [start_index, end_index]
"plurals": [[5,1],[5,3]], #plural [anaphor, antecedent] pairs, "both cars" --> "a car", "another car"
"doc_key": "nw",
"sentences": [["John", "has", "a", "car", "."], ["Mary", "has", "another", "car", "."] ["John", "washed", "both", "cars", "yesteday","."]]
}
```
- The mentions only contain two properties [start_index, end_index] the indices are counted in document level and both inclusive.
- For coreference clusters (includes singleton clusters) are represented by their mention indices.
- For plural pairs, each pair contains two mention indices the first one is the anaphora and the second one is the antecedent.
Then use python evaluate.py config_name to start your evaluation

You will need additionally to create the character vocabulary by using python get_char_vocab.py train.jsonlines dev.jsonlines
Then you can start training by using python train.py config_name

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
extract_bert_features		extract_bert_features
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
experiments.conf		experiments.conf
get_char_vocab.py		get_char_vocab.py
plural_model.py		plural_model.py
train.py		train.py
util.py		util.py