Reproducibility questions

Hi @Yangsenqiao 

I have some questions about the code.

1. How many hours are required to train models using bash scripts/run_efficient_gpt4o_judge.sh？Are you using 32 A100 GPUs with 80GB memory each for this training?

2. The code saves checkpoints every 5 steps. How should we select the checkpoint for final evaluation to reproduce the results in Table 2 of the paper? Do we choose the checkpoints with the highest validation accuracy reward?

3. For the visionthink results presented in the paper, were they obtained using gpt4o-as-judge or qwen-as-judge?

Looking forward to your reply. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducibility questions #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducibility questions #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions