Contains dataset of scientific articles from ACL and EMNLP up to 2018.
Articles contents are organized in xml format, as resulted by extraction from the corresponding PDFs using Grobid. Data is organized separately in article contents, references and headers corresponding to each of the articles in the dataset.
This data is distributed under a Creative Commons License. When using this data in your research, please reference the following publication:
Caragea, Cornelia, Ana Uban, and Liviu P. Dinu. "The Myth of Double-Blind Review Revisited: ACL vs. EMNLP." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2317-2327. 2019.