Skip to content

HVision-NKU/Conv2Former

Repository files navigation

Conv2Former

The official implementation of the paper "Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition". Our code is based on timm and ConvNeXt.

Our paper is accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

Usage

Requirements

  • torch>=1.12.1
  • torchvision>=0.13.1
  • timm>=0.9.12

Examples

from models.conv2former import conv2former_n

# create a Conv2Former model with 1000 classes
model = conv2former_n(num_classes=1000) 

Input image should be normalized as follows:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                  std=[0.229, 0.224, 0.225])

We also provide the Jittor version of the model:

import jittor as jt
from jittor_models.conv2former import conv2former_n
   
# create a Conv2Former model with 1000 classes
model = conv2former_n(num_classes=1000) 

Training

bash distributed_train.sh 8 $DATA_DIR --model $MODEL -b $BS --lr $LR --drop-path $DROP_PATH

# DATA_DIR: path to the dataset
# MODEL: name of the model
# BS: batch size
# LR: learning rate
# DROP_PATH: drop path rate

validation

python validate.py $DATA_DIR --model $MODEL --checkpoint $CHECKPOINT

# DATA_DIR: path to the dataset
# MODEL: name of the model
# CHECKPOINT: path to the saved checkpoint

Results

Training on ImageNet-1k

Model Parameters FLOPs Image resolution Top 1 Acc. Model File
Conv2Former-N 15M 2.2G 224 81.5% baiduyun
SwinT-T 28M 4.5G 224 81.5% -
ConvNeXt-T 29M 4.5G 224 82.1% -
Conv2Former-T 27M 4.4G 224 83.2% baiduyun
SwinT-S 50M 8.7G 224 83.0% -
ConvNeXt-S 50M 8.7G 224 83.1% -
Conv2Former-S 50M 8.7G 224 84.1% baiduyun
RepLKNet-31B 79M 15.3G 224 83.5% -
SwinT-B 88M 15.4G 224 83.5% -
ConvNeXt-B 89M 15.4G 224 83.8% -
FocalNet-B 89M 15.4G 224 83.9% -
Conv2Former-B 90M 15.9G 224 84.4% baiduyun

Pre-Training on ImageNet-22k and Finetining on ImageNet-1k

Model Parameters FLOPs Image resolution Top 1 Acc. Model File
ConvNeXt-S 50M 8.7G 224 84.6% -
Conv2Former-S 50M 8.7G 224 84.9% baiduyun
SwinT-B 88M 15.4G 224 85.2% -
ConvNeXt-B 89M 15.4G 224 85.8% -
Conv2Former-B 90M 15.9G 224 86.2% baiduyun
SwinT-B 88M 47.0G 384 86.4% -
ConvNeXt-B 89M 45.1G 384 86.8% -
Conv2Former-B 90M 46.7G 384 87.0% -
SwinT-L 197M 34.5G 224 86.3% -
ConvNeXt-L 198M 34.4G 224 86.6% -
Conv2Former-L 199M 36.0G 224 87.0% baiduyun
EffNet-V2-XL 208M 94G 480 87.3% -
SwinT-L 197M 104G 384 87.3% -
ConvNeXt-L 198M 101G 384 87.5% -
CoAtNet-3 168M 107G 384 87.6% -
Conv2Former-L 199M 106G 384 87.7% baiduyun

Citation

If you find this work or code is helpful in your research, please cite:

@article{hou2024conv2former,
  title={Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition},
  author={Hou, Qibin and Lu, Cheng-Ze and Cheng, Ming-Ming and Feng, Jiashi},
  journal={IEEE TPAMI},
  year={2024},
  doi={10.1109/TPAMI.2024.3401450}, 
}

Reference

You may want to cite:

@inproceedings{liu2022convnet,
      title={A ConvNet for the 2020s}, 
      author={Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
      booktitle=CVPR,
      year={2022}
}

@inproceedings{liu2021swin,
  title={Swin transformer: Hierarchical vision transformer using shifted windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  booktitle=ICCV,
  year={2021}
}

@inproceedings{tan2021efficientnetv2,
  title={Efficientnetv2: Smaller models and faster training},
  author={Tan, Mingxing and Le, Quoc},
  booktitle=ICML,
  pages={10096--10106},
  year={2021},
  organization={PMLR}
}

@misc{focalmnet,
  author = {Yang, Jianwei and Li, Chunyuan and Gao, Jianfeng},
  title = {Focal Modulation Networks},
  publisher = {arXiv},
  year = {2022},
}

@article{dai2021coatnet,
  title={Coatnet: Marrying convolution and attention for all data sizes},
  author={Dai, Zihang and Liu, Hanxiao and Le, Quoc and Tan, Mingxing},
  journal=NIPS,
  volume={34},
  year={2021}
}

@inproceedings{replknet,
  author = {Ding, Xiaohan and Zhang, Xiangyu and Zhou, Yizhuang and Han, Jungong and Ding, Guiguang and Sun, Jian},
  title = {Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs},
  booktitle=CVPR,
  year = {2022},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published