Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question of the post-training motivatioin. #13

Closed
doubleZ0108 opened this issue Feb 20, 2025 · 2 comments
Closed

Question of the post-training motivatioin. #13

doubleZ0108 opened this issue Feb 20, 2025 · 2 comments

Comments

@doubleZ0108
Copy link

doubleZ0108 commented Feb 20, 2025

Hi, thanks for your work. The demo on Hugging-Face is charming.

I am using Qwen-2.5-VL family for object detection these days, both 3B / 7B / 72B demonstrate strong visual positioning capabilities. I ask for the bounding box coordinates of the target object and make it arranged on .json format. After several rounds of prompt word engineering, I can get promising results from the model. Normally this is not a regular VL task, but the Qwen does really well.

So what I'm curious about is whether the SFT and R1 training strategies you mentioned are essentially "how to automatically organize prompts to make object recognition tasks easier", which I mean the automatically prompt engineering, but no new knowledge emerged for the bbox-style detection task?

Looking forward to hearing your answer.

@snakeztc
Copy link
Contributor

The current results we show is training the model with only a few hundreds steps on the REC task, so we believe it's more like aligning the model to the task better with GRPO loss (i.e. not much new knowledge is introduced to the model).

That said the parameters are updated via GRPO training, and when we training the model longer with more tasks, we do believe new knowledge can be injected.

@doubleZ0108
Copy link
Author

It's an interesting finding. I'll keep an eye on the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants