RAG-LER: Ranking Adapted Generation with Language-Model Enabled Regulation

This repo includes the implementation of our paper RAG-LER: Ranking Adapted Generation with Language-Model Enabled Regulation.

Introduction

We introduce RAG-LER, a novel framework that enhances an LM’s context understanding and improves the quality and accuracy of provided passages through an LM-supervised re-ranker. RAG-LER fine-tunes a pre-trained LM to follow instructions and discriminately use provided information. It then leverages this fine-tuned LM to generate ranking scores, which serve as supervised labels for training the re-ranker.

Update (Mar 18th, 2025)

We open-weighted our trained Mistral-7B model on HuggingFace to encourage and advance further research.

Update (Dec 28th, 2024)

We have updated our re-ranker training method which incorporates a reference model during training.
We immigrated our experiment recording from mlflow to wandb. Both are wonderful tools for experimental tracking.
We addressed some dependency issues mainly on retrieval.

Installation

Installation can be done by running the command:

# Clone the repo
git clone https://github.com/notoookay/rag-ler.git

cd rag-ler
source setup.sh

We use wandb for our experiment recording.

Training

Our training data can be downloaded at HuggingFace. The training datasets can be found in the data directory.

Our training data includes datasets for training LLM and re-ranker. See our paper for details of data processing.

LM training

The LM training data can be found in llm_train.jsonl. We include a set of instruction-tuning datasets and open-domain QA datasets for improving the capability of Instruction-following and Reading Comprehension.

To fine-tune an LLM under same configuration described in paper, you can directly run the fine-tuning scripts.

bash ./scripts/finetune_llm.sh

Note: please check and modify the data path in the script (same for the scripts below).

Feel free to modify the settings in this script for custom test.

Our 7B and 13B models are available on HuggingFace in case you want to use directly.

For quick start, you can use trained LM:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("notoookay/ragler-llama2-7b")
model = AutoModelForCausalLM.from_pretrained("notoookay/ragler-llama2-7b", torch_dtype=torch.bfloat16, device_map="auto")

# Example usage
input_text = "### Instruction:\nAnswer the following question.\n\n### Input:\nQuestion:\nWhat is the capital of France?\n\n### Response:\n"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Re-ranker training

The re-ranker training data can be found in reranker_train.jsonl. It includes Natural Questions and HotpotQA for single-hop and multi-hop Question Answering. It already includes the retreived passages from Dec 2018 wikidump using Contriever MS-MARCO. In case you need to retrieve by yourself, you can download the corpus from Contriever by running:

# Download Dec 2018 corpus
wget https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz

# Download corresponding embeddings
wget https://dl.fbaipublicfiles.com/contriever/embeddings/contriever-msmarco/wikipedia_embeddings.tar

As re-ranker training is supervised by LM, we need to get training label (probabilities) from LM, you can get labels from LM by running:

bash ./scripts/prepare_reranker_train_data.sh

In addition to Llama2 family models, we also trained Mistral-7b model, which uses less memory (about 40GB), so you may reduce inference cost when testing on Mistral model.

We inference our fine-tuned LLMs using the same configuration in the script above. After you get the labels, the re-ranker can be trained by running:

bash ./scripts/finetune_reranker.sh

Retrieval

In default, we use Contriever-MS MARCO as our retriever. We use Dec 2018 wiki dump mentioned above for our evaluation. We use Dec 2020 for PopQA. For corpus downloading, please refer to Atlas corpus download guide.

Sparse retrieval

You can retrieve with sparse retriever (e.g. BM25). We use pyserini for our BM25 retrieval.

Before retrieval, you need to build BM25 index for the corpus. Please check this for building BM25 index of the corpus. After building the index, you can retrieve by running:

bash ./scripts/passage_retrieval_bm25.sh

We split the sparse and dense retrieval for clarity.

Dense retrieval

For dense retrieval (e.g. Contriever), you need to generate the embeddings of both your input data and corpus. To build dense embedding, you can run:

python retrieval/generate_passage_embeddings.py \
  --model_name_or_path facebook/contriever-msmarco \
  --output_dir embeddings/enwiki-dec2021 \
  --passages corpora/wiki/enwiki-dec2021/text-list-100-sec.jsonl \
  --shard_id 0 --num_shards 1

# Or using script directly
bash ./scripts/generate_passage_embeddings.sh

After generating the embeddings, you can retrieve by running:

bash ./scripts/passage_retrieval.sh

We use FAISS for our similarity search of dense vectors. We recommend using faiss-gpu for fast search, which costs about 110GB GPU memory for Dec 2020 wikidump.

Evaluation

We evaluate on a set of knowledge-intensive tasks including open-doamin QA and fact checking.

In addition to knowledge-intensive tasks, we also include several commonsense-reasoning evaluations, which typically do not need retrieval.

You can evaluate by running:

bash ./scripts/run_llm.sh

Note: Remember to modify the arguments you use, for more specific details of arguments, please refer to our paper.

Citation

If you find our work helpful, please consider citing our paper:

@article{ZHAI2025131514,
title = {RAG-LER: Ranking adapted generation with language-model enabled regulation},
journal = {Neurocomputing},
volume = {656},
pages = {131514},
year = {2025},
issn = {0925-2312},
doi = {https://doi.org/10.1016/j.neucom.2025.131514},
url = {https://www.sciencedirect.com/science/article/pii/S0925231225021861},
author = {Fengwen Zhai and Wenyang Tang and Jing Jin},
keywords = {Language modeling, Retrieval augmented generation, Information retrieval, Re-ranking}
}

Contributing

We welcome contributions from the community! Whether it's fixing bugs, adding new features, improving documentation, or providing feedback, your help is invaluable.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data_creation		data_creation
ds_configs		ds_configs
img		img
in_context_learning		in_context_learning
retrieval		retrieval
scripts		scripts
LICENSE		LICENSE
README.md		README.md
evaluate_passages.py		evaluate_passages.py
finetune.py		finetune.py
finetune_reranker.py		finetune_reranker.py
merge_lora.py		merge_lora.py
metrics.py		metrics.py
requirements.txt		requirements.txt
run_llm.py		run_llm.py
setup.sh		setup.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-LER: Ranking Adapted Generation with Language-Model Enabled Regulation

Introduction

Update (Mar 18th, 2025)

Update (Dec 28th, 2024)

Table of Contents

Installation

Training

LM training

Re-ranker training

Retrieval

Sparse retrieval

Dense retrieval

Evaluation

Citation

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

notoookay/rag-ler

Folders and files

Latest commit

History

Repository files navigation

RAG-LER: Ranking Adapted Generation with Language-Model Enabled Regulation

Introduction

Update (Mar 18th, 2025)

Update (Dec 28th, 2024)

Table of Contents

Installation

Training

LM training

Re-ranker training

Retrieval

Sparse retrieval

Dense retrieval

Evaluation

Citation

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages