Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search

Official Artifact for the paper "Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search" submitted to VLDB 2026

Paper PDF | Project Website | License: Apache 2.0

📖 Overview

Hybrid search, combining lexical and semantic retrieval, is now a foundational technology for modern information retrieval. However, the architectural design space for these systems is vast and complex, yet a systematic, empirical understanding of the trade-offs among their core components—retrieval paradigms, combination schemes, and re-ranking methods—is critically lacking.

This work presents the first systematic benchmark of advanced hybrid search architectures, informed by our experience building the Infinity open-source database. We evaluate four retrieval paradigms (FTS, SVS, DVS, TenS) and their 15 combinations across 11 real-world datasets to provide a data-driven map of the performance landscape. This repository provides all the necessary tools to reproduce our findings and to extend this research.

🚀 Getting Started

Infinity: Please follow the instructions on the Project Website to install the Infinity database.
Python Dependencies:
```
pip install -r requirements.txt
```

🏛️ Repository Structure

This repository is split into two main components to serve different purposes:

exps/: Paper Reproducibility This directory provides a set of scripts to reproduce all experimental results reported in our paper. If your goal is to validate and reproduce our findings, this is the place to start. See exps/README.md for instructions.
hs_bench/: Evaluation Framework This directory contains the modular scripts for the evaluation pipeline (configs, embedding, import, indexing, retrieval, and evaluation). This is the code to modify if you want to extend our work—for example, by adding new datasets or embedding models. See hs_bench/README.md for the full technical guide.

How to Use

To Reproduce Paper Results:

Clone this repository.
Navigate to exps/ and follow the exps/README.md instructions.
Download datasets as per exps/datasets/README.md.
Run the scripts in exps/scripts/ to build indexes, run all experiments, and evaluate the results.

To Extend This Work:

Clone this repository.
Navigate to hs_bench/ and read the hs_bench/README.md.
Follow the guide to add your own configs, models, or datasets.

📜 Citation

If you find this work useful for your research, please consider citing our paper:

@article{hybridsearch25-infinity,
  title={Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search},
  author={Wang, Mengzhao and Tan, Boyu and Gao, Yunjun and Jin, Hai and Zhang, Yingfeng and Ke, Xiangyu and Xu, Xiangliang and Zhu, Yifan},
  journal={arXiv preprint arXiv:2508.01405},
  year={2025}
}

🤝 Contribution

We welcome contributions from the community to improve and extend this benchmark framework. We encourage researchers and developers to help by reporting bugs, proposing new features by opening an issue, or submitting code changes via a pull request. To ensure a smooth collaboration, please first refer to our detailed contribution guidelines outlined in CONTRIBUTING.md.

📄 License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

🙏 Acknowledgements

The design and implementation of the evaluation framework in this repository were significantly informed by our work on the Infinity open-source database. We are grateful to our colleagues on the Infinity team for their foundational work and insightful discussions. A special thanks to Zhichang Yu, Yushi Shen, Zhiqiang Yang, Ling Qin, and Yi Xiao for their invaluable contributions to the Infinity project.

Name		Name	Last commit message	Last commit date
Latest commit History 2,769 Commits
.github		.github
client		client
cmake		cmake
conf		conf
exps		exps
gui		gui
hs_bench		hs_bench
src		src
third_party		third_party
thrift		thrift
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.tester_env		.tester_env
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
centos7-vault.repo		centos7-vault.repo
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search

📖 Overview

🚀 Getting Started

🏛️ Repository Structure

How to Use

📜 Citation

🤝 Contribution

📄 License

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

whenever5225/infinity

Folders and files

Latest commit

History

Repository files navigation

Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search

📖 Overview

🚀 Getting Started

🏛️ Repository Structure

How to Use

📜 Citation

🤝 Contribution

📄 License

🙏 Acknowledgements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages