-
Notifications
You must be signed in to change notification settings - Fork 21
Description
I completely copied your open-source code, and the conda environment was configured exactly according to your README. The only modification I made was changing the paths and replacing the base model with my local sd3.5-m model.
Both I and another person independently reproduced your code without making any additional modifications.
To verify whether the reproduction and deployment were successful, we both trained using the Geneval dataset provided in the ZIP file. All training parameters remained the same as in your original implementation, using the Flow and SD3 configurations. The reward evaluation files were also exactly as specified in your README —
mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco_20220504_001756-743b7d99.pth,
and the CLIP-ViT model pulled directly from Hugging Face.
During both training runs, the advantage mean in the logs consistently stayed greater than zero.
However, in the first run, after 500 steps, when we evaluated the model using the Geneval benchmark, the generated images for the same prompts were identical to those produced by the original sd3.5-m model, suggesting that no effective training had occurred.
In the second run, the results were even worse — after 500 steps, the generated images were noisy and only showed vague outlines.
When using Geneval for evaluation and the prompt is “two wine cup”.
SD3.5-meduim&&the first reproduction:
the second time reproduction :
