Update README.md

naver-ai · Jul 28, 2022 · 13b9bf3 · 13b9bf3
1 parent 0be8d41
commit 13b9bf3
Showing 1 changed file with 5 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# Extended COCO Validation (ECCV) Caption dataset
+# Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
 
 Official Python implementation of ECCV Caption | [Paper](https://arxiv.org/abs/2204.03359)
 
@@ -14,8 +14,7 @@ For more details, please read our paper:
 
 ### Abstract
 
-Image-Test matching (ITM) is a common task for evaluating the quality of Vision and Language (VL) models. However, existing ITM benchmarks have a significant limitation. They have many missing correspondences, originating from the data construction process itself. For example, a caption is only matched with one image although the caption can be matched with other similar images, and vice versa. To correct the massive false negatives, we construct the Extended COCO Validation (ECCV) Caption dataset by supplying the missing associations with machine and human annotators. We employ five state-of-the-art ITM models with diverse properties for our annotation process. Our dataset provides x3.6 positive image-to-caption associations and x8.5 caption-to-image associations compared to the original MS-COCO. We also propose to use an informative ranking-based metric, rather than the popular Recall@K(R@K). We re-evaluate the existing 25 VL models on existing and proposed benchmarks. Our findings are that the existing benchmarks, such as COCO 1K R@K, COCO 5K R@K, CxC R@1 are highly correlated with each other, while the rankings change when we shift to the ECCV mAP. Lastly, we delve into the effect of the bias introduced by the choice of machine annotator. Source code and dataset are available in [https://github.com/naver-ai/eccv-caption](https://github.com/naver-ai/eccv-caption)
-
+Image-Text matching (ITM) is a common task for evaluating the quality of Vision and Language (VL) models. However, existing ITM benchmarks have a significant limitation. They have many missing correspondences, originating from the data construction process itself. For example, a caption is only matched with one image although the caption can be matched with other similar images and vice versa. To correct the massive false negatives, we construct the Extended COCO Validation (ECCV) Caption dataset by supplying the missing associations with machine and human annotators. We employ five state-of-the-art ITM models with diverse properties for our annotation process. Our dataset provides x3.6 positive image-to-caption associations and x8.5 caption-to-image associations compared to the original MS-COCO. We also propose to use an informative ranking-based metric mAP@R, rather than the popular Recall@K (R@K). We re-evaluate the existing 25 VL models on existing and proposed benchmarks. Our findings are that the existing benchmarks, such as COCO 1K R@K, COCO 5K R@K, CxC R@1 are highly correlated with each other, while the rankings change when we shift to the ECCV mAP@R. Lastly, we delve into the effect of the bias introduced by the choice of machine annotator. Source code and dataset are available at https://github.com/naver-ai/eccv-caption
 
 ### Dataset statistics
 
@@ -195,9 +194,10 @@ THE SOFTWARE.
 ## How to cite
 
 ```
-@article{chun2022eccv_caption,
+@inproceedings{chun2022eccv_caption,
     title={ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO}, 
     author={Chun, Sanghyuk and Kim, Wonjae and Park, Song and Chang, Minsuk Chang and Oh, Seong Joon},
-    journal={arXiv preprint arXiv:2204.03359},
+    year={2022},
+    booktitle={European Conference on Computer Vision (ECCV)},
 }
 ```