π¬ Official codebase for the paper:
GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning
arXiv: 2504.01886 Β· Yanzhou Su, Tianbin Li, Jiyao Liu, Chenglong Ma, Junzhi Ning, Cheng Tang, Sibo Ju, Jin Ye, Pengcheng Chen, Ming Hu, et al.
This repository contains the implementation of GMAI-VL-R1, a framework for multimodal medical reasoning powered by reinforcement learning.
We aim to explore how large vision-language models (VLMs) can improve medical visual understanding and reasoning with rule-based reward design and high-quality medical datasets.
Our method incorporates recent advancements in reinforcement learning for vision-language models and adapts them to the challenging domain of medical imaging.
Our work builds upon and integrates ideas from several open-source projects:
- MM-Eureka-V0 (FanqingM/MM-Eureka-V0)
- open-r1-multimodal (EvolvingLMMs-Lab/open-r1-multimodal)
- MM-EUREKA (ModalMinds/MM-EUREKA)
We thank the authors of these projects for their valuable contributions to the community.
We use a high-quality medical reasoning dataset:
π Dataset Name: GMAI-Reasoning10K
π Format: JSONL
π· Modalities: X-ray, CT, MRI, OCT, Ultrasound, etc.
π’ Samples: 10,000 high-quality multiple-choice questions designed for reinforcement learning training
π Source: Aggregated from 95 public medical datasets (e.g., Kaggle, GrandChallenge)
If you find our work helpful, please consider citing our paper:
@article{su2025gmai,
title={Gmai-vl-r1: Harnessing reinforcement learning for multimodal medical reasoning},
author={Su, Yanzhou and Li, Tianbin and Liu, Jiyao and Ma, Chenglong and Ning, Junzhi and Tang, Cheng and Ju, Sibo and Ye, Jin and Chen, Pengcheng and Hu, Ming and others},
journal={arXiv preprint arXiv:2504.01886},
year={2025}
}