🌐 Homepage | 🏆 Leaderboard | 🤗 BMMR | 📖 Paper
This repo contains the evaluation code for the paper "BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset"
- 🔥[2025-07-08]: We have released the evaluation scripts on GitHub, and we warmly welcome everyone to try them out! 👏
- 🔥[2025-07-08]: We have released the dataset on HuggingFace! 🌟
We introduce BMMR, a large-scale bilingual, multimodal, multi-disciplinary reasoning dataset for the community to develop and evaluate large multimodal models (LMMs). BMMR comprises 110k college-level questions spanning 300 UNESCO-defined subjects, spanning diverse formats—multiple-choice, fill-in-the-blank, and open-ended QA—and sourced from both print and digital media such as books, exams, and quizzes. All data are curated and filtered via a human-in-the-loop and scalable framework, and each instance is paired with a high-quality reasoning path. The dataset is organized into two parts: BMMR-Eval that comprises 20,458 high-quality instances to comprehensively assess LMMs’ knowledge and reasoning across multiple disciplines in both Chinese and English; and BMMR-Train that contains 88,991 instances to support further research and development, extending the current focus on mathematical reasoning to diverse disciplines and domains. In addition, we propose the process-based multi-discipline verifier (i.e., BMMR-Verifier) for accurate and fine-grained evaluation of reasoning paths.
Extensive experiments on
The evaluation set and training set of BMMR can be found in:
We have merged our benchmark into the VLMEvalKit. You can use VLMEvalKit for evaluation, or alternatively, use our evaluation script as described below:
Download the test set and put it into ./data/.
pip install -r requirements.txt
You need to deploy the model using either vLLM or LMDeploy. Then, update the src/config.json file with the parameters such as base_url, api_key, and model based on your deployment. Finally, run the script eval.sh.
bash src/eval.sh
- Zhiheng Xi: zhxi22@m.fudan.edu.cn
BibTeX:
@misc{xi2025bmmrlargescalebilingualmultimodal,
title={BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset},
author={Zhiheng Xi and Guanyu Li and Yutao Fan and Honglin Guo and Yufang Liu and Xiaoran Fan and Jiaqi Liu and Jingchao Ding and Wangmeng Zuo and Zhenfei Yin and Lei Bai and Tao Ji and Tao Gui and Qi Zhang and Xuanjing Huang},
year={2025},
eprint={2507.03483},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.03483},
}
This repository is built with reference to MMMU.
