Qualitative results trained on single reward / Head-to-head comparison with multiple rewards

Thanks for putting out inspiring work!

I was wondering if you are willing to release qualitative results and/or checkpoints of DiffusionNFT trained on a single reward. In my opinion, the most impressive result of DiffusionNFT is its head-to-head comparison with FlowGRPO (Figure 6), made when trained on a single reward. However, all qualitative results (Figures 5, 11-13) & evaluation on multiple rewards (Table 1) are based on the multi-reward optimized model. I am concerned that the huge acceleration in the single reward setting may have accompanied more severe reward hacking than Flow-GRPO.

I am also curious if you are willing to add a head-to-head comparison against Flow-GRPO in multi-reward joint training. I would like to know whether the huge acceleration and perfomance gap are still prevalent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qualitative results trained on single reward / Head-to-head comparison with multiple rewards #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qualitative results trained on single reward / Head-to-head comparison with multiple rewards #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions