Hi, there is a description about the evaluation metric in the paper:
"Following FastComposer [Xiao et al., 2023] and MM-Diff [Wei et al., 2024], we use CLIP image similarity (CLIP-I) to compare the generated images with reference images."
Does it mean that the CLIP-I is computed between the generated image of the model and the input reference image for the model?
Looking forward to your reply, thanks.