[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
language computer-vision vision clip image-retrieval fine-grained robustness text-retrieval multimodal compositionality vision-language vision-language-model cvpr2024 compostional
-
Updated
Nov 12, 2024 - Jupyter Notebook