This repo contains the Pytorch implementation of our CVPR 2023 paper:
CrOC: Cross-View Online Clustering for Dense Visual Representation Learning
Thomas Stegmüller*, Tim Lebailly*, Behzad Bozorgtabar, Tinne Tuytelaars, and Jean-Philippe Thiran.
Our code only has a few dependencies. First, install PyTorch for your machine following https://pytorch.org/get-started/locally/. Then, install other needed dependencies:
pip install einops
Run the main_croc.py file. Command line args are defined in parser.py.
python main_croc.py --args1 val1
Make sure to use the right arguments specified in the table below!
python -m torch.distributed.launch --nproc_per_node=8 main_croc.py --args1 val1
If you find our work useful, please consider citing:
@inproceedings{stegmuller2023croc,
title={CrOC: Cross-view online clustering for dense visual representation learning},
author={Stegm{\"u}ller, Thomas and Lebailly, Tim and Bozorgtabar, Behzad and Tuytelaars, Tinne and Thiran, Jean-Philippe},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7000--7009},
year={2023}
}
You can download the full checkpoint which contains backbone and projection head weights for both student and teacher networks. We also provide detailed arguments to reproduce our results. Note that the results here are slightly higher than those reported in the paper for COCO and COCO+. This is because we realized that these runs had not finished training for 300 epochs.
pretraining dataset | arch | params | batchsize | LC PVOC12 | LC COCO things | LC COCO stuff | download | |
---|---|---|---|---|---|---|---|---|
COCO | ViT-S/16 | 21M | 256 | 54.9% | 55.7% | 49.9% | full ckpt | args |
COCO+ | ViT-S/16 | 21M | 256 | 61.6% | 64.4% | 52.2% | full ckpt | args |
ImageNet-1k | ViT-S/16 | 21M | 1024 | 70.6% | 66.1% | 52.6% | full ckpt | args |
This code is adapted from DINO.