Skip to content

Reproducibility questions #8

@LinZichuan

Description

@LinZichuan

Hi @Yangsenqiao

I have some questions about the code.

  1. How many hours are required to train models using bash scripts/run_efficient_gpt4o_judge.sh?Are you using 32 A100 GPUs with 80GB memory each for this training?

  2. The code saves checkpoints every 5 steps. How should we select the checkpoint for final evaluation to reproduce the results in Table 2 of the paper? Do we choose the checkpoints with the highest validation accuracy reward?

  3. For the visionthink results presented in the paper, were they obtained using gpt4o-as-judge or qwen-as-judge?

Looking forward to your reply. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions