Skip to content

Files

Latest commit

066e6a1 · Jan 22, 2020

History

History
This branch is 3667 commits behind google-research/google-research:master.

dense_representations_for_entity_retrieval

Learning Dense Representations for Entity Retrieval

Code supporting the publication Learning Dense Representations for Entity Retrieval by Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano.

Paper available at: https://arxiv.org/abs/1909.10506

If you use this code in your research, please cite the paper as follows:

@misc{gillick2019learning,
    title={Learning Dense Representations for Entity Retrieval},
    author={Daniel Gillick and Sayali Kulkarni and Larry Lansing and Alessandro Presta and Jason Baldridge and Eugene Ie and Diego Garcia-Olano},
    year={2019},
    eprint={1909.10506},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Wikinews

The parse_wikinews.py script generates the 2018 Wikinews dataset described in Learning Dense Representations for Entity Retrieval by parsing the wikitext from the Jan 1, 2019 dump of Wikinews found on archive.org.

To generate the dataset yourself, download the Wikinews dump from https://archive.org/download/enwikinews-20190101/enwikinews-20190101-pages-articles.xml.bz2 and run parse_wikinews.py with the wikinews_archive and output_dir flags set appropriately. See parse_wikinews.sh for an example of correct usage.

Disclaimer

This is not an official Google product.