The motivation #10

Germany321 · 2025-02-20T07:34:49Z

Can your model generate COT just like Deepseek-R1? If not, I am curious what is the motivation to use RL for training?

CaptainEven · 2025-02-20T07:45:11Z

Good question! I 'm curious abot the CoT SFT or R1 training too! Looking forward to the answer!

xrc10 · 2025-02-20T09:26:31Z

The model can generate COT. And that is the motivation of using GRPO because the training data does not have ground truth COT annotation. Please check out the demo here: https://huggingface.co/spaces/omlab/VLM-R1-Referral-Expression

Germany321 · 2025-02-20T14:19:25Z

Thanks for your reply. I have tried you demo and the model can really generate thought. It is interesting. Are SFT results trained on COT data? Or you just supervise the model with the IOU + format reward and let the model learn to generate COT by themselves?

snakeztc · 2025-02-20T14:49:47Z

There is no COT SFT involved. We just do GPRO on the Qwen-2.5 VL 3B directly and the reasoning process is emerged automatically.

CaptainEven · 2025-02-21T03:45:17Z

There is no COT SFT involved. We just do GPRO on the Qwen-2.5 VL 3B directly and the reasoning process is emerged automatically.

Excellent!

Germany321 · 2025-02-21T07:09:33Z

There is no COT SFT involved. We just do GPRO on the Qwen-2.5 VL 3B directly and the reasoning process is emerged automatically.

That sounds very interesting!

Germany321 changed the title ~~COT~~ The motivation Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The motivation #10

The motivation #10

Germany321 commented Feb 20, 2025

CaptainEven commented Feb 20, 2025 •

edited

Loading

xrc10 commented Feb 20, 2025

Germany321 commented Feb 20, 2025 •

edited

Loading

snakeztc commented Feb 20, 2025 •

edited

Loading

CaptainEven commented Feb 21, 2025

Germany321 commented Feb 21, 2025

The motivation #10

The motivation #10

Comments

Germany321 commented Feb 20, 2025

CaptainEven commented Feb 20, 2025 • edited Loading

xrc10 commented Feb 20, 2025

Germany321 commented Feb 20, 2025 • edited Loading

snakeztc commented Feb 20, 2025 • edited Loading

CaptainEven commented Feb 21, 2025

Germany321 commented Feb 21, 2025

CaptainEven commented Feb 20, 2025 •

edited

Loading

Germany321 commented Feb 20, 2025 •

edited

Loading

snakeztc commented Feb 20, 2025 •

edited

Loading