This repository contains the code and data for paper LitSearch: A Retrieval Benchmark for Scientific Literature Search. In this paper, we introduce a benchmark consisting of a set of 597 realistic literature search queries about recent ML and NLP papers. We provide the code we used for benchmarking state-of-the-art retrieval models and two LLM-based reranking pipelines.
Please install the latest versions of PyTorch (torch
), NumPy (numpy
), HuggingFace Transformers (transformers
), HuggingFace Datasets (datasets
), SentenceTransformers (sentence-transformers
), InstructorEmbedding (InstructorEmbedding
), Rank-BM25 (rank-bm25
), GritLM (gritlm
) and the OpenAI API package (openai
). This codebase is tested on torch==1.13.1
, numpy==1.23.5
, transformers==4.30.2
, datasets==2.20.0
, sentence-transformers==2.2.2
, InstructorEmbedding==1.0.1
, rank-bm25==0.2.2
, gritlm==1.0.0
and openai==1.33.0
with Python 3.10.14.
Note: We used a standalone environment for GritLM since its dependencies were incompatible with other packages.
We provide the LitSearch query set and retrieval corpus as separate HuggingFace datasets
configurations under princeton-nlp/LitSearch
. We also provide the retrieval corpus in the Semantic Scholar Open Research Corpus (S2ORC) format along with all available metadata to facilitate exploration of retrieval strategies more advanced than the ones we implement in this codebase. The data can be downloaded using the datasets
package using
from datasets import load_dataset
query_data = load_dataset("princeton-nlp/LitSearch", "query", split="full")
corpus_clean_data = load_dataset("princeton-nlp/LitSearch", "corpus_clean", split="full")
corpus_s2orc_data = load_dataset("princeton-nlp/LitSearch", "corpus_s2orc", split="full")
eval/retrieval/
- Contains a parent class for retrievers in
kv_store.py
and implementations of 5 retrieval pipelines including BM25 (bm25.py
), GTR (gtr.py
), Instructor (instructor.py
), E5 (e5.py
) and GRIT (grit.py
). - Contains
build_index.py
for building a retrieval index of the required type using a given retrieval corpus. - Contains
evaluate_index.py
for evaluating a retriever using the associated retrieval index and a query set.
- Contains a parent class for retrievers in
eval/reranking/rerank.py
contains code for reranking a provided set of retrieval results using GPT4. This code is adapted from Rank-GPT.eval/onehop/get_onehop_union.py
contains code that implements the first stage of the one-hop reranking operation described in section 3.2 of our paper. Once the union is computed using this script, GPT4-based reranking is applied as before usingeval/reranking/rerank.py
.
This repository provides support for running evaluations using the BM25, GTR, Instructor, E5 and GRIT retrievers, reranking using GPT-4, and executing a one-hop reranking strategy. We provide sample commands for running the corresponding scripts:
python3 -m eval.retrieval.build_index --index_type bm25 --key title_abstract
python3 -m eval.retrieval.evaluate_index --index_name LitSearch.title_abstract.bm25
python3 -m eval.reranking.rerank --retrieval_results_file results/retrieval/LitSearch.title_abstract.bm25.jsonl
python3 -m eval.onehop.get_onehop_union --input_path results/retrieval/LitSearch.title_abstract.bm25.jsonl
python3 -m eval.reranking.rerank --retrieval_results_file results/onehop/prereranking/LitSearch.title_abstract.bm25.union.jsonl --output_dir results/onehop/postreranking --max_k 200
If you have any questions related to the code or the paper, feel free to email Anirudh (anirudh.ajith@princeton.edu
). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
Please cite our paper if you use LitSearch in your work:
@article{ajith2024litsearch,
title={LitSearch: A Retrieval Benchmark for Scientific Literature Search},
author={Ajith, Anirudh and Xia, Mengzhou and Chevalier, Alexis and Goyal, Tanya and Chen, Danqi and Gao, Tianyu},
year={2024}
}