Skip to content

Qualitative results trained on single reward / Head-to-head comparison with multiple rewards #6

@jaylee2000

Description

@jaylee2000

Thanks for putting out inspiring work!

I was wondering if you are willing to release qualitative results and/or checkpoints of DiffusionNFT trained on a single reward. In my opinion, the most impressive result of DiffusionNFT is its head-to-head comparison with FlowGRPO (Figure 6), made when trained on a single reward. However, all qualitative results (Figures 5, 11-13) & evaluation on multiple rewards (Table 1) are based on the multi-reward optimized model. I am concerned that the huge acceleration in the single reward setting may have accompanied more severe reward hacking than Flow-GRPO.

I am also curious if you are willing to add a head-to-head comparison against Flow-GRPO in multi-reward joint training. I would like to know whether the huge acceleration and perfomance gap are still prevalent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions