🌐 Project Page | 📜 arXiv | 📮 Twitter Post
[2025.08.20] 🚀 FusionFactory & FusionBench are here! Most LLM apps still rely on a single model — limiting capability & wasting tokens. FusionBench (14 tasks, 5 domains, 20 LLMs, 103M tokens) + FusionFactory (query-, thought-, and model-level fusion) unlock powerful multi-LLM collaboration. ✅ Results: FusionFactory consistently outperforms the best single LLM across 14 benchmarks. 📄 Paper | 💻 Code | 🐦 Twitter
[2025.06.18] 🔥 Router-R1 has officially been released, which is a cutting-edge, reinforcement learning-driven LLM router designed to enable seamless collaboration among multiple LLMs to tackle complex problems efficiently. Explore the project and get started here: Router-R1. Stay updated with the latest news and developments by following us on Twitter!
📊 We also benchmark GraphRouter on the collected router dataset in Router-R1, demonstrating its strong performance across multiple QA benchmarks under different LLM settings.
📈 GraphRouter Results on Router Dataset from Router-R1
Base Model | NQ† | TriviaQA | PopQA | HotpotQA† | 2WikiMultiHopQA | Musique | Bamboogle | Avg. |
---|---|---|---|---|---|---|---|---|
Qwen2.5-3B-Instruct | 0.276 | 0.586 | 0.280 | 0.234 | 0.180 | 0.076 | 0.448 | 0.297 |
Llama-3.2-3B-Instruct | 0.316 | 0.602 | 0.290 | 0.222 | 0.170 | 0.084 | 0.416 | 0.300 |
-
† indicates in-domain evaluation; all others are out-of-domain.
-
Evaluation Metric: Exact Match
-
LLM Routing Pool: Qwen2.5-7B-Instruct, LLaMA-3.1-8B-Instruct, LLaMA-3.1-70B-Instruct, Mistral-7B-Instruct, Mixtral-8x22B-Instruct, Gemma-2-27B-Instruct
🎯 The fine-tuned weights for GraphRouter on this dataset are now released at model_path/best_model_qa.pth
[2025.01.22] 🌟 GraphRouter is accepted for ICLR 2025.
# create a new environment
conda create -n graphrouter python=3.10
conda activate graphrouter
# install pytorch. Modify the command to align with your own CUDA version.
pip3 install torch --index-url https://download.pytorch.org/whl/cu118
# install related libraries
pip install -r requirements.txt
# install pyg
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu118.html
First, generate 'data/unified_qa_data.csv'.
python data_processing/multidata_unify.py
Then, generate data/router_data.csv
and configs/llm_description_embedding.pkl
by setting your api_key in configs/config.yaml
.
python data_processing/construct_router_data.py
For your convenience, we provide download links for the 'unified_qa_data.csv' and 'router_data.csv' files we generated. Please download them and put them in data
folder.
unified_qa_data.csv router_data.csv
Run experiments and print/save evaluation results on metrics Performance, Cost, and Reward. You can edit the hyperparameters in configs/config.yaml
or using your own config_file.
python run_exp.py --config_file [config]
-
Embedding Normalization
- Check whether input embeddings are normalized.
- On some datasets, skipping normalization leads to suboptimal results.
-
Network Initialization
- Experiment with different initialization methods.
- Try varying random seeds or using alternative initialization schemes.
-
Model Saving Strategy
- Instead of saving models based on highest accuracy, save checkpoints with the best evaluation set performance.
- This can yield better results on certain tasks.
-
Learning Rate Tuning
- Adjust learning rate carefully.
- Slightly increasing it may help avoid local optima and improve stability.
@inproceedings{feng2024graphrouter,
title={Graphrouter: A graph-based router for llm selections},
author={Feng, Tao and Shen, Yanzhen and You, Jiaxuan},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2024}
}