TL;DR: SePer is an accurate / fast / API-free metric to measure retrieval utility via information gain.
This repository contains the official implementation of the ICLR 2025 Spotlight paper:
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Authors: Lu Dai, Yijie Xu, Jinhui Ye, Hao Liu, Hui Xiong
- Paper (arXiv): https://arxiv.org/abs/2503.01478
- Paper (OpenReview): https://openreview.net/forum?id=ixMBnOhFGd
- Project page: https://sepermetric.github.io/
- Code: https://github.com/sepermetric/seper
SePer introduces a framework to evaluate retrieval utility by analyzing semantic perplexity and semantic perplexity reduction. This provides a more fine-grained utility signal than relying on ranking metrics or downstream answer quality alone.
Below is an illustration of SePer's fine-grained evaluation ability:
SePer is especially useful when:
- Two retrievers have similar ranking metrics, but downstream quality differs.
- You want to measure whether retrieved evidence truly reduces model uncertainty.
- You need a utility-centric signal to complement framework-level RAG evaluation.
conda create -n seper python=3.11
conda activate seper
pip install torch
pip install -r requirements.txtA minimal walkthrough is provided in example.ipynb.
The retriever benchmark is available at: https://sepermetric.github.io/
If you find our work useful, please cite:
@inproceedings{dai2025seper,
title={SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction},
author={Dai, Lu and Xu, Yijie and Ye, Jinhui and Liu, Hao and Xiong, Hui},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025},
doi={10.48550/arXiv.2503.01478},
url={https://openreview.net/forum?id=ixMBnOhFGd}
}This repo includes machine-readable metadata for discoverability:
CITATION.cff(GitHub citation support)codemeta.json(CodeMeta metadata)llms.txtanddocs/llms-full.txt(LLM-oriented summaries)
