CountCLIP : [Re] Teaching Clip to Count to Ten

This repository contains the implementation of the paper Teaching Clip to Count to Ten by Google Research, published in ICCV 2023. This paper presented a method to fine-tune Vision-Language Models (VLMs), like CLIP, to improve zero-shot counting accuracy in an image while maintaining the performance for zero-shot classification by introducing a counting-contrastive loss term to the original loss function. This changes the training objective to discriminate between the correct and the incorrect captions associated with the object counts in an image.

Demo of our model learning to count

Usage

Colab Demo:

To run the Python script (recommended version python 3.10), run the following after downloading the dataset files in the scripts folder:

 git clone https://github.com/SforAiDl/CountCLIP.git
 cd CountCLIP/scripts  
 conda create -n <env_name> python=3.10  
 pip install requirements.txt  
 python3 experiment.py

Repository structure

count_set_gen.ipynb contains the implementation for generating the counting set as described in Section 3.1 of the paper.
model.ipynb contains the implementation for the counting loss function as described in Section 3.2 of the paper.
The folder data_utils contains miscellaneous notebooks for downloading data, merging datasets etc.
- download.ipynb and cb_download.ipynb were used for downloading the training and validation data respectively.
- create_json.ipynb and merge.ipynb were used to create and merge the JSON files for the data.
- parse_faulty.ipynb was used to compile non-functional images into a single file.
The folder old contains incomplete and outdated code used to make the final implementation.

Dataset

We have created a small counting set of ~2000 images after passing over 2 million images out of the 400 million present in the original dataset. This is merged with ~13000 non-counting images from the same dataset. The entire merged dataset, along with the required relevant JSON/CSV files, can be found here .

data.zip - merged counting and noncounting data, along with the validation data (the CountBench dataset).
merged.json - JSON for merged (counting+noncounting) data.
val.json - JSON for the CountBench data.
faulty.csv - CSV for removing faulty noncounting images.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
assets		assets
data_utils		data_utils
old		old
resc		resc
script		script
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
baseline.ipynb		baseline.ipynb
count_set_gen.ipynb		count_set_gen.ipynb
dataset.py		dataset.py
model.ipynb		model.ipynb
requirements.txt		requirements.txt
run.sh		run.sh
runner.py		runner.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CountCLIP : [Re] Teaching Clip to Count to Ten

Usage

Repository structure

Dataset

Special Thanks

About

Releases

Packages

Contributors 4

Languages

SforAiDl/CountCLIP

Folders and files

Latest commit

History

Repository files navigation

CountCLIP : [Re] Teaching Clip to Count to Ten

Usage

Repository structure

Dataset

Special Thanks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages