Thanks for putting out inspiring work!
I was wondering if you are willing to release qualitative results and/or checkpoints of DiffusionNFT trained on a single reward. In my opinion, the most impressive result of DiffusionNFT is its head-to-head comparison with FlowGRPO (Figure 6), made when trained on a single reward. However, all qualitative results (Figures 5, 11-13) & evaluation on multiple rewards (Table 1) are based on the multi-reward optimized model. I am concerned that the huge acceleration in the single reward setting may have accompanied more severe reward hacking than Flow-GRPO.
I am also curious if you are willing to add a head-to-head comparison against Flow-GRPO in multi-reward joint training. I would like to know whether the huge acceleration and perfomance gap are still prevalent.