You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, author, I recently saw this paper of yours and I have questions about some of them:
Data set construction: I am curious about the manual replacement of 250 examples for MS-COCO. Are the segment_descriptions here all come up by myself? And then there's layout mask, how is this layout png constructed?
Regarding the improvement of self-attention and cross-attention layers in the generation of diagrams, I have read a lot of papers recently, and I feel that the improvement point is relatively small, are there any other areas for improvement?
The text was updated successfully, but these errors were encountered:
Regarding the first question, we just segmented the original texts rather than coming up with all the object-wise labels from scratch. It is also the case for layout images, since MS-COCO dataset offers instance-wise layout labels.
For the second question, I first want you to consider the difficulty of evaluating generative models especially when the target is so specific as ours. As you mentioned, our method shows small improvements on some metrics but we would like to emphasize that the strongest contribution is coming from improving the fidelity to layout conditions of existing t2i model even without requiring fine tuning process.
Hello, author, I recently saw this paper of yours and I have questions about some of them:
The text was updated successfully, but these errors were encountered: