COFIE: COVID-19 Open Functional Information Extraction

This repository contains models, datasets and experiments described in Extracting a Knowledge Base of Mechanisms from COVID-19 Papers.

Please cite our paper if you use our datasets or models in your project. See the BibTeX.
Feel free to email us.

COFIE / COFIE-G datasets

We provide two annotated datasets:

COFIE: Coarse-grained mechanism relations (Direct and Indirect)
COFIE-G: Granular mechanism relations (Subject-Predicate-Object)

From project root, run scripts/data/get_cofie.sh to download both datasets to the data directory.

COFIE will be downloaded to data/cofie/[train,dev,test].json. Development and test sets for are also available in tabular format: data/cofie-gold/[dev,test]-gold.tsv
COFIE-G will be downloaded to data/cofie-g/split/[train,dev,test].json. Tabular format:data/cofie-g-gold/[dev,test]-gold.tsv

Pretrained models

We provide models pre-trained on COFIE and COFIE-G.

Downloads

From project root, run scripts/pretrained/get_cofie_pretrained.sh to download all the available pretrained models to the pretrained directory. If you only want one model, here are the download links.

Dependencies

This code repository is forked from DYGIE++, Wadden 2019.

This code was developed using Python 3.7. To create a new Conda environment using Python 3.7, do conda create --name cofie python=3.7.

This library relies on AllenNLP and uses AllenNLP shell commands to kick off training, evaluation, and testing.

We use the Allentune for hyperparameter search. For installing a compatible version of the Allentune library, please download the allentune git repo outside of dygiepp directory using:

git clone https://github.com/allenai/allentune.git

Then replace the files provided in this repository using command

cp -r allentune_files/[location of downloaded allentune]

The you can proceed with installing allentune by running

pip install --editable .

in allentune downloaded folder.

After installing allentune please proceed with installing required libraries for DyGIE++. The necessary dependencies can be installed with

pip install -r requirements.txt

Making predictions on existing datasets

To make a prediction, you can use allennlp predict. For example, to make a prediction with a pretrained granular relation model:

allennlp predict pretrained/ternary-model.tar.gz \
    data/cofie-g/split/test.json \
    --predictor dygie \
    --include-package dygie \
    --use-dataset-reader \
    --output-file predictions/cofie-g-test.jsonl \
    --cuda-device 0 \
    --silent

For predicting coarse relations using a pretrained model:

allennlp predict pretrained/binary.tar.gz \
    data/cofie/test.json \
    --predictor dygie \
    --include-package dygie \
    --use-dataset-reader \
    --output-file predictions/cofie-test.jsonl \
    --cuda-device 0 \
    --silent

Running these commands will provide json-formatted predictions.

Alternatively you can use the predict scripts provided by this library to generate both .tsv and .json file. You can use :

python predict_binary.py --data_dir data/cofie --device 0 --serial_dir pretrained/binary-model.tar.gz  --pred_dir predictions/cofie-test/

for coarse relation predictions and

python predict_ternary.py --data_dir data/cofie-g/collated --device 0 --serial_dir pretrained/ternary-model.tar.gz  --pred_dir predictions/cofie-t-test/

for granular relation predictions.

Relation extraction evaluation metric

We report Precision/Recall/F1 measured by using exact and partial span-matching functions. Full details are described in our paper.

Training with Allentune

We use Allentune for hyperparameter tuning. To train a model for coarse relation extraction using Allentune, you can run the script below.

python scripts/train/train_allentune.py --data_dir data/cofie --device 0,1,2,3 --serial_dir models/cofie/ --gpu_count 4 --cpu_count 12 --device 0,1,2,3

To train the model for granular relations:

python scripts/train/train_event_allentune.py --data_dir data/processed/collated_events/ --serial_dir ./models/events --gpu_count 4 --cpu_count 12 --device 0,1,2,3

The default number of training samples is set to 30. For more training options please use the --h command.

To obtain predictions for the development set over all Allentune runs:

python predict.py --data_dir data/cofie --device 0 --serial_dir models/cofie/

for the coarse relation model and

python predict_event_allentune.py --serial_dir ./models/cofie-t --data_dir ./data/cofie-t/ --pred_dir ./predictions/cofie-t

for the granular relation model.

You can get test set predcitions by indicating only the run index you want to use for inference:

python predict.py --data_dir data/cofie --device 0,1,2,3 --serial_dir models/cofie/  --pred_dir predictions/cofie

for coarse relations and

python predict_event_allentune.py --serial_dir ./models/cofie-t --data_dir ./data/cofie-t/ --pred_dir ./predictions/cofie-t --test_data --test_index 17

for granular relations.

Citation

If using our dataset and models, please cite:

@inproceedings{amini-hope-2020-cofie,
    title={{Extracting a Knowledge Base of Mechanisms from COVID-19 Papers
}},
    author={Tom Hope and Aida Amini and David Wadden and Madeleine van Zuylen and E. Horvitz and Roy Schwartz and Hannaneh Hajishirzi},
    year={2020},
    url={https://arxiv.org/pdf/2010.03824.pdf}
}

Contact us

Please don't hesitate to reach out.

Email: tomh@allenai.org, amini91@cs.washington.edu

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
doc		doc
dygie		dygie
notebooks		notebooks
scripts		scripts
training_config		training_config
.DS_Store		.DS_Store
.gitignore		.gitignore
COFIE-G.png		COFIE-G.png
COFIE.png		COFIE.png
KG_create_and_eval.py		KG_create_and_eval.py
KG_search_utils.py		KG_search_utils.py
README.md		README.md
create_kb_embeddings.py		create_kb_embeddings.py
decode.py		decode.py
dygie_visualize_util.py		dygie_visualize_util.py
eval_event_allentune.py		eval_event_allentune.py
eval_metric.py		eval_metric.py
eval_metric_allentune.py		eval_metric_allentune.py
eval_utils.py		eval_utils.py
kb_rels_granular.txt		kb_rels_granular.txt
kb_spans_granular.txt		kb_spans_granular.txt
not_complete_granular.txt		not_complete_granular.txt
predict.py		predict.py
predict_allentune.py		predict_allentune.py
predict_binary.py		predict_binary.py
predict_event_allentune.py		predict_event_allentune.py
predict_ternary.py		predict_ternary.py
predict_ternary_err.log		predict_ternary_err.log
requirements.txt		requirements.txt
scierc_pred_to_coife.py		scierc_pred_to_coife.py
task_queries.py		task_queries.py
vocab.py		vocab.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COFIE: COVID-19 Open Functional Information Extraction

COFIE / COFIE-G datasets

Pretrained models

Downloads

Table of Contents

Dependencies

Making predictions on existing datasets

Relation extraction evaluation metric

Training with Allentune

Citation

Contact us

About

Releases

Packages

Languages

hepengfe/DYGIE

Folders and files

Latest commit

History

Repository files navigation

COFIE: COVID-19 Open Functional Information Extraction

COFIE / COFIE-G datasets

Pretrained models

Downloads

Table of Contents

Dependencies

Making predictions on existing datasets

Relation extraction evaluation metric

Training with Allentune

Citation

Contact us

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages