Official Artifact for the paper "Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search" submitted to VLDB 2026
Paper PDF | Project Website | License: Apache 2.0
Hybrid search, combining lexical and semantic retrieval, is now a foundational technology for modern information retrieval. However, the architectural design space for these systems is vast and complex, yet a systematic, empirical understanding of the trade-offs among their core components—retrieval paradigms, combination schemes, and re-ranking methods—is critically lacking.
This work presents the first systematic benchmark of advanced hybrid search architectures, informed by our experience building the Infinity open-source database. We evaluate four retrieval paradigms (FTS, SVS, DVS, TenS) and their 15 combinations across 11 real-world datasets to provide a data-driven map of the performance landscape. This repository provides all the necessary tools to reproduce our findings and to extend this research.
- Infinity: Please follow the instructions on the Project Website to install the Infinity database.
- Python Dependencies:
pip install -r requirements.txt
This repository is split into two main components to serve different purposes:
-
exps/: Paper Reproducibility This directory provides a set of scripts to reproduce all experimental results reported in our paper. If your goal is to validate and reproduce our findings, this is the place to start. Seeexps/README.mdfor instructions. -
hs_bench/: Evaluation Framework This directory contains the modular scripts for the evaluation pipeline (configs, embedding, import, indexing, retrieval, and evaluation). This is the code to modify if you want to extend our work—for example, by adding new datasets or embedding models. Seehs_bench/README.mdfor the full technical guide.
To Reproduce Paper Results:
- Clone this repository.
- Navigate to
exps/and follow theexps/README.mdinstructions. - Download datasets as per
exps/datasets/README.md. - Run the scripts in
exps/scripts/to build indexes, run all experiments, and evaluate the results.
To Extend This Work:
- Clone this repository.
- Navigate to
hs_bench/and read thehs_bench/README.md. - Follow the guide to add your own configs, models, or datasets.
If you find this work useful for your research, please consider citing our paper:
@article{hybridsearch25-infinity,
title={Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search},
author={Wang, Mengzhao and Tan, Boyu and Gao, Yunjun and Jin, Hai and Zhang, Yingfeng and Ke, Xiangyu and Xu, Xiangliang and Zhu, Yifan},
journal={arXiv preprint arXiv:2508.01405},
year={2025}
}We welcome contributions from the community to improve and extend this benchmark framework. We encourage researchers and developers to help by reporting bugs, proposing new features by opening an issue, or submitting code changes via a pull request. To ensure a smooth collaboration, please first refer to our detailed contribution guidelines outlined in CONTRIBUTING.md.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
The design and implementation of the evaluation framework in this repository were significantly informed by our work on the Infinity open-source database. We are grateful to our colleagues on the Infinity team for their foundational work and insightful discussions. A special thanks to Zhichang Yu, Yushi Shen, Zhiqiang Yang, Ling Qin, and Yi Xiao for their invaluable contributions to the Infinity project.