Skip to content

Latest commit

 

History

History
99 lines (71 loc) · 5.24 KB

README.md

File metadata and controls

99 lines (71 loc) · 5.24 KB

Touché 2020 V3

Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR

💻 GitHub 🤗 Dataset 📝 SIGIR 2024 Paper

Welcome to the reproduction study of the Touché 2020 dataset in the BEIR benchmark, where previous studies have found that neural retrieval models are considerably less effective than BM25.

We further investigate on what makes argument retrieval so “special”, and "why" do neural retrievers underperform BM25 on Touché 2020?

comparison

From the figure above, we observe that a majority of neural retrievers on Touché 2020 retrieve "short" (in length) arguments as their top-10 results which are often non-argumentative. In addition, all retrievers (including BM25) have a huge portion of holes present in the corpus leading to rather low nDCG@10 performances.

TL;DR: We denoise the Touché 2020 document collection and remove noisy arguments and conduct post-hoc judgements to release a cleaner Touché 2020 v3 collection. This repository uses code from existing well-known repositories such as BEIR, Pyserini and SPRINT for reproduction and provides baseline retrieval model scores on Touché 2020 v3 dataset.

To learn more about our reproduction study, please refer below to the following publications:

Getting Started

Installation

You will need to install tookits: Pyserini (BM25), SPRINT toolkit (SPLADEv2) and BEIR (Dense Models). To install the necessary packages, run:

conda create -n python_env python=3.10
conda activate python_env

# Install JDK 21 via conda
conda install -c conda-forge openjdk=21

# PyPI installations: BEIR, Pyserini, SPRINT
pip install -r requirements.txt

Dataset

The Touche 2020 v3 dataset (denoised + post-hoc judged) can be found here: castorini/webis-touche2020-v3.

  • corpus.jsonl contains 303,372 arguments with argument premise as body (filtering the argument corpus).
  • queries.jsonl contains 49 controversial queries (all test queries).
  • qrels/test.tsv contains 2,849 relevance judgements in total (including additional post-hoc relevance judgements).

Examples

Citation

If you use this code or dataset in your research, please cite our SIGIR 2024 paper.

@INPROCEEDINGS{Thakur_etal_SIGIR2024,
   author = "Nandan Thakur and Luiz Bonifacio and Maik {Fr\"{o}be} and Alexander Bondarenko and Ehsan Kamalloo and Martin Potthast and Matthias Hagen and Jimmy Lin",
   title = "Systematic Evaluation of Neural Retrieval Models on the {Touch\'{e}} 2020 Argument Retrieval Subset of {BEIR}",
   booktitle = "Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval",
   year = 2024,
   address_ = "Washington, D.C."
}

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Authors

  • Nandan Thakur (University of Waterloo, Waterloo, Canada)
  • Luiz Bonifacio (UNICAMP and University of Waterloo, Campinas, Brazil)
  • Maik Fröbe (Friedrich-Schiller-Universität Jena, Jena, Germany)
  • Alexander Bondarenko (Leipzig University and Friedrich-Schiller-Universität Jena, Leipzig, Germany)
  • Ehsan Kamalloo (University of Waterloo, Waterloo, Canada)
  • Martin Potthast (University of Kassel, hessian.AI, and ScaDS.AI, Kassel, Germany)
  • Matthias Hagen (Friedrich-Schiller-Universität Jena, Jena, Germany)
  • Jimmy Lin (University of Waterloo, Waterloo, Canada)

Acknowledgments

We would like to thank all contributors and the institutions involved in this research. Special thanks to the BEIR benchmark and Touché 2020 authors.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.