You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transfusion seems to be also about AR + Diffusion Multi Modality Model (https://huggingface.co/papers/2408.11039). Are you using similar techniques? Is there any major difference?
The text was updated successfully, but these errors were encountered:
Hi, sorry for the late reply. You can find two distinct differences i) representations for multimodal understanding, clip-vit and magvitv2 (ours) vs vae (transfusion); ii) representations for generation, magvitv2 (ours) vs vae (transfusion). More details can be found in our paper. Btw, welcome to star our repository.
Transfusion seems to be also about AR + Diffusion Multi Modality Model (https://huggingface.co/papers/2408.11039). Are you using similar techniques? Is there any major difference?
The text was updated successfully, but these errors were encountered: