🔥 Update

REX-RAG: Reasoning Exploration with Policy Correction in Retrieval-Augmented Generation

Wentao Jiang^{1 }, Xiang Feng^{1 }, Zengmao Wang^{1 †}, Yong Luo¹, Pingbo Xu^2,3, Zhe Chen⁴, Bo Du¹, Jing Zhang^{1 †}

¹ School of Computer Science, Wuhan University, China,
² Department of Anesthesiology, Zhejiang Cancer Hospital, China,
³ Institute of Medicine, Chinese Academy of Sciences, Hangzhou, Zhejiang, China
⁴ Department of Computer Science and Information Technology, La Trobe University, Australia

^∗ Equal contribution, ^† Corresponding author

🔥 Update

2025.08.11

We released the code on Github!!!

🌞 Introduction

REX-RAG is a reinforcement learning framework for Retrieval-Augmented Generation that escapes reasoning dead ends through a mixed sampling strategy and maintains stable policy learning via a principled policy correction mechanism. It delivers significant performance boosts on multi-hop reasoning and general QA tasks, with strong out-of-domain generalization and compatibility with various RL training algorithms.

Figure 1: Overview of the REX-RAG.

🛠️ Usage

Note: Please ensure that you have configured the paths within the bash scripts to match your local environment.

1. Installation

First, install the required dependencies. We recommend using uv for faster installation.

# Upgrade pip and install uv
pip install --upgrade pip
pip install uv

# Install sglang (version 0.4.6.post4 is required)
uv pip install "sglang[all]==0.4.6.post4"

# Install PyTorch (replace cu12x with your CUDA version, e.g., cu126)
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu1xx

# Install flash-attn
pip3 install flash-attn --no-build-isolation

# Install other dependencies
pip install wandb

# Install the project in editable mode
pip install -e .

For more details on sglang installation, please refer to the official documentation.

2. Data Preparation

Retriever

The retriever requires a Wikipedia corpus. You can either process it manually or download a pre-processed version.

Option A: Process Wikipedia manually. Follow the instructions at FlashRAG Wiki Processing.
Option B: Download pre-processed data. Download the data from Hugging Face Datasets.

After obtaining the data, build the search index:

bash scripts/search_engine/build_index.sh

Datasets

You can fetch the required datasets using git lfs.

git lfs pull

Alternatively, you can use your own custom dataset. Please refer to the preprocessing methods described in Search-R1.

3. Running the Application

First, start the retriever server:

bash scripts/search_engine/retrieval_server.sh

Then, you can proceed to run the main application.

bash scripts/search-r1-sgl/run_grpo_sglang_fsdp.sh

📖 Main Results

Figure 2: Main experimental results on seven QA benchmarks of REX-RAG.

🔍 Visualization

Figure 3: Uncertainty quantification visualization comparing Qwen2.5-7B and Qwen2.5-7B with REX-RAG.

Figure 3 presents a visualization analysis comparing the reasoning trajectories of the original Qwen2.5-7B model against the same model enhanced with REX-RAG. This analysis uses the uncertainty quantification method from LogTokU (GitHub).

Following their framework, we analyze two types of uncertainty:

Aleatoric Uncertainty (AU): Represents inherent data randomness.
Epistemic Uncertainty (EU): Captures gaps in the model's knowledge.

These are measured through token-level confidence scoring. The visualization demonstrates that REX-RAG achieves significantly higher reliability scores for its reasoning tokens (typically in the 0.6-0.8 range), whereas the baseline model exhibits lower reliability (generally in the 0.2-0.4 range).

Acknowledgements

We would like to express our gratitude to the following open-source projects that were instrumental in our work:

Special thanks to LogTokU for their excellent work on uncertainty visualization, which we adapted for our analysis.

⭐ Citation

If you find our work useful, please consider giving a ⭐ and citing our paper:

@article{jiang2025rex,
  title={REX-RAG: Reasoning Exploration with Policy Correction in Retrieval-Augmented Generation},
  author={Jiang, Wentao and Feng, Xiang and Wang, Zengmao and Luo, Yong and Xu, Pingbo and Chen, Zhe and Du, Bo and Zhang, Jing},
  journal={arXiv preprint arXiv:2508.08149},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.config/vllm		.config/vllm
figs		figs
scripts		scripts
search_r1		search_r1
verl		verl
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

REX-RAG: Reasoning Exploration with Policy Correction in Retrieval-Augmented Generation

^∗ Equal contribution, ^† Corresponding author

🔥 Update

🌞 Introduction

🛠️ Usage

1. Installation

2. Data Preparation

Retriever

Datasets

3. Running the Application

📖 Main Results

🔍 Visualization

Acknowledgements

⭐ Citation

About

Uh oh!

Releases

Packages

Languages

MiliLab/REX-RAG

Folders and files

Latest commit

History

Repository files navigation

REX-RAG: Reasoning Exploration with Policy Correction in Retrieval-Augmented Generation

∗ Equal contribution, † Corresponding author

🔥 Update

🌞 Introduction

🛠️ Usage

1. Installation

2. Data Preparation

Retriever

Datasets

3. Running the Application

📖 Main Results

🔍 Visualization

Acknowledgements

⭐ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

^∗ Equal contribution, ^† Corresponding author

Packages