OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

🎉 News

[2025-11]: We have created two fun slides (Doraemon & Pokemon) to explain OpenMMReasoner. Enjoy :) Credit to the amazing NotebookLM and Gemini-3.
[2025-11]: 🏆: Top #1 Paper of the day at HuggingFace Daily Papers (Nov.24, 2025), Welcome to checkout our OpenMMReasoner HF Daily Paper!
[2025-11]: Join our WeChat group by scanning this QR code.
[2025-11]: We release all of our code, model, data, and pipeline! Check out the OpenMMReasoner collection on Hugging Face.

Overview

Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the lack of transparent and reproducible data curation and training strategies remains a major barrier to scalable research.

In this work, we introduce OpenMMReasoner, a fully transparent two-stage recipe for multimodal reasoning spanning supervised fine-tuning (SFT) and reinforcement learning (RL). In the SFT stage, we construct an 874K-sample cold-start dataset with rigorous step-by-step validation, providing a strong foundation for reasoning capabilities. The subsequent RL stage leverages a 74K-sample dataset across diverse domains to further sharpen and stabilize these abilities, resulting in a more robust and efficient learning process. Extensive evaluations demonstrate that our training recipe not only surpasses strong baselines but also highlights the critical role of data quality and training design in shaping multimodal reasoning performance. Notably, our method achieves a 11.6% improvement over the Qwen2.5-VL-7B-Instruct baseline across nine multimodal reasoning benchmarks, establishing a solid empirical foundation for future large-scale multimodal reasoning research.

Installation

1. SFT Training

Please follow the installation instructions in lmms-engine to prepare the environment for supervised fine-tuning.

2. RL Training

We provide our source verl code, which is a detached fork from the original verl. You can choose to use either our version (included in this repository) or the original verl for RL training.

The installation steps are similar to the standard verl setup. Please follow the instruction from verl to install all the requirements with an updated version of vllm. Additionally, you need to install math-verify to use our reward function:

pip install math-verify

For our RL training pipeline, we use the following package versions:

transformers==4.57.1
vllm==0.11.0

3. Evaluation

Please follow the installation instructions in lmms-eval to set up the evaluation environment.

4. Data Pipeline

We open-sourced our data processing pipeline and code for the community to follow. To install requirements for Data Pipeline:

cd ./data_pipeline

uv pip install -e .

We recommend you to use separate environments if you encounter a conflict in requirements.

Getting Started

Data Preparation

We provide a convenient script to download all the required datasets from Hugging Face:

bash examples/openmmreasoner/download_data.sh [LOCAL_DIR]

This script will download both the SFT (874K samples) and RL (74K samples) datasets to your specified directory (defaults to ./data).

SFT Training

After installing lmms-engine, you can launch SFT training using either:

Option 1: Using a configuration YAML file

# Edit the dataset paths in sft_example_config.yaml
torchrun --nproc_per_node="8" \
    --nnodes="1" \
    --node_rank="0" \
    --master_addr="127.0.0.1" \
    --master_port="8000" \
    -m lmms_engine.launch.cli config_yaml=${CONFIG}

Option 2: Using the launch script

# Edit the dataset paths and hyperparameters in the script
bash examples/openmmreasoner/sft_example_launch.sh

Troubleshooting:

If you encounter OOM (Out of Memory) errors, reduce the packing_length parameter in your configuration.
If mixing text and image data causes a hang, consider adding a blank dummy image for text-only samples in the m1 dataset.

RL Training

We provide two example scripts for RL training:

Option 1: Local training

bash examples/openmmreasoner/gspo_n16.sh

Option 2: Training with Ray

To launch training in multi-node environment, you should first setup ray on your head and worker node. Then submit the job as in the bash script.

bash examples/openmmreasoner/gspo_ray.sh

Make sure to update the DATA_FOLDER and PROJECT_FOLDER paths in the scripts before launching.

Evaluation

After setting up lmms-eval, use the provided evaluation script:

bash examples/openmmreasoner/eval.sh <CHECKPOINT_PATH> <TASK_NAME>

Image Tasks:

bash examples/openmmreasoner/eval.sh /path/to/checkpoint "mmmu_reasoning_reward,wemath_testmini_thinking,mmmu_pro_vision_cot_reward,mmmu_pro_standard_cot_reward,mathvista_testmini_cot_reward,mathvision_reason_testmini_reward,mathvision_reason_test_reward,mathverse_testmini_reward,logicvista_thinking,dynamath,charxiv_val_descriptive_cot,charxiv_val_reasoning_cot"

Text Tasks:

bash examples/openmmreasoner/eval.sh /path/to/checkpoint "gpqa_diamond_thinking,aime_agg8"

LLM Judge Setup

We use an LLM as judge for both evaluation and RL reward calculation. Our default judge model is Qwen/Qwen3-235B-A22B-Instruct-2507.

Steps:

Set up a server using vLLM or SGLang:

# Example with SGLang
python3 -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B-Instruct-2507 \
     --tp-size 8 \
     --dp-size 1 \
     --served-model-name judge \
     --port 8000 \
     --host 0.0.0.0 --mem-fraction-static 0.75

Update the judge service address in your scripts:
- For RL training: Update OPENAI_BASE_URL in gspo_n16.sh or gspo_ray.sh
- For evaluation: Update OPENAI_BASE_URL in eval.sh

export OPENAI_API_KEY="EMPTY"
export OPENAI_BASE_URL="http://your-judge-server-address:8000/v1"
export OPENAI_MODEL_NAME="judge"
export USE_LLM_JUDGE="True"

Data Processing Pipeline

To follow our data processing pipeline, we provide example scripts in data_pipeline/examples/. The pipeline supports two main operations:

Deduplicating RL Data

To deduplicate RL training data, follow these steps:

Prepare the RL configuration: Create a YAML config file based on data_pipeline/examples/example_rl_config.yaml:

datasets:
  - path: /path/to/your/dataset.parquet
    data_folder: "/path/to/images"
    data_type: parquet

Run embedding: Generate embeddings for the dataset:

cd data_pipeline
bash examples/embed_data.sh /path/to/your_rl_config.yaml cache/embed rl

Run deduplication: Remove duplicates based on embeddings:

bash examples/deduplicate_data.sh /path/to/your_rl_config.yaml cache/embed rl cache/deduplicate

Distilling Dataset

To distill a dataset using a teacher model:

Prepare the SFT configuration: Create a YAML config file based on data_pipeline/examples/example_sft_config.yaml:

datasets:
  - path: /path/to/your/dataset.parquet
    data_folder: "/path/to/images"
    data_type: parquet

Run distillation: Edit data_pipeline/examples/distill_dataset.sh to set your server addresses, then run:

cd data_pipeline
bash examples/distill_dataset.sh

Make sure to configure the model server and judge server URLs in the script before running.

Evaluation Results

Our OpenMMReasoner-7B (OMR-7B) model demonstrates strong performance across a comprehensive suite of multimodal reasoning benchmarks. With only 874K SFT samples and 74K RL samples—significantly less data than many competing methods—our model achieves state-of-the-art or highly competitive results on 9 out of 14 benchmark tasks. Notably, OMR-7B achieves 79.5% on MathVista testmini (best among all models), 63.8% on MathVerse testmini (best), and 79.0% on WeMath loose (best), demonstrating the effectiveness of our transparent two-stage training recipe. This performance validates our emphasis on data quality and rigorous training design over simply scaling dataset size.

Model	SFT Data	RL Data	MathVista testmini	MathVision test	MathVision testmini	MathVerse testmini	DynaMath worst	WeMath loose	LogicVista test	MMMU val	MMMU-Pro standard	MMMU-Pro vision	CharXiv reas.	CharXiv desc.
VLAA-Thinker-Qwen2.5-7B	126k	25k	68.0	26.4	-	48.2	22.4	-	48.5	-	-	-	-	-
ThinkLite-7B-VL	-	11k	71.6	24.6	-	42.9	16.5	-	42.7	-	-	-	-	-
VL-Rethinker-7B	-	39k	73.7	28.4	-	46.4	17.8	-	42.7	-	41.7	-	-	-
M2-Reasoning	6.2M	102k	75.0	42.1	-	40.4	-	-	50.6	-	-	-	-	-
MMR1	1.6M	15k	72.0	31.8	29.0†	55.4	27.9†	68.0†	48.9	52.4†	41.1†	37.1†	43.5†	71.1†
OpenVLThinker-7B	3.3k	9.6k	65.3	23.0	26.9†	38.1	16.8	61.9†	44.5	55.1†	39.7†	38.4†	41.0†	69.2†
MM-Eureka-Qwen-7B	-	15.6k	72.6	28.1	32.1†	45.4	23.0	59.8†	46.3	54.4†	40.1†	37.1†	42.4†	74.1†
OVR-7B	2M	300k	72.1	51.8	38.2†	54.6	33.5	64.8	54.8	51.8†	50.2	29.1†	44.5	73.6
OMR-7B (ours)	874k	74k	79.5	43.6	38.8	63.8	34.9	79.0	50.0	57.8	44.1	40.6	46.1	73.5

Note: Bold numbers indicate the best performance, and † indicates results reproduced using the authors' checkpoints.

Citation

If you find OpenMMReasoner useful for your research and applications, please cite using this BibTeX:

@misc{zhang2025openmmreasonerpushingfrontiersmultimodal,
      title={OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe}, 
      author={Kaichen Zhang and Keming Wu and Zuhao Yang and Kairui Hu and Bin Wang and Ziwei Liu and Xingxuan Li and Lidong Bing},
      year={2025},
      eprint={2511.16334},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2511.16334}, 
}

Acknowledgements

We gratefully acknowledge the following open-source projects that made this work possible:

lmms-eval for providing the comprehensive evaluation framework for large multimodal models.
lmms-engine for the SFT training infrastructure and tools.
verl for the reinforcement learning training framework.

We thank the developers and contributors of these projects for their excellent work and for making their code publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 1,502 Commits
.vscode		.vscode
assets		assets
custom_rewards		custom_rewards
custom_sampler		custom_sampler
data_pipeline		data_pipeline
examples		examples
lmms_eval_tasks		lmms_eval_tasks
verl		verl
vibe_slides		vibe_slides
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-cuda.txt		requirements-cuda.txt
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

🎉 News

Table of Contents

Overview

Installation

1. SFT Training

2. RL Training

3. Evaluation

4. Data Pipeline

Getting Started

Data Preparation

SFT Training

RL Training

Evaluation

LLM Judge Setup

Data Processing Pipeline

Deduplicating RL Data

Distilling Dataset

Evaluation Results

Citation

Acknowledgements

⭐ Star History

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

EvolvingLMMs-Lab/OpenMMReasoner

Folders and files

Latest commit

History

Repository files navigation

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

🎉 News

Table of Contents

Overview

Installation

1. SFT Training

2. RL Training

3. Evaluation

4. Data Pipeline

Getting Started

Data Preparation

SFT Training

RL Training

Evaluation

LLM Judge Setup

Data Processing Pipeline

Deduplicating RL Data

Distilling Dataset

Evaluation Results

Citation

Acknowledgements

⭐ Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages