Skip to content

Latest commit

 

History

History
93 lines (80 loc) · 3.61 KB

README.md

File metadata and controls

93 lines (80 loc) · 3.61 KB

TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition

This is an official PyTorch implementation of "TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition".

Introduction

TransXNet is a CNN-Transformer hybrid vision backbone that can model both global and local dynamics with a Dual Dynamic Token Mixer (D-Mixer), achieving superior performance over both CNN and Transformer-based models.

Image Classification

1. Requirements

We highly suggest using our provided dependencies to ensure reproducibility:

# Environments:
cuda==11.6
python==3.8.15
# Packages:
mmcv==1.7.1
timm==0.6.12
torch==1.13.1
torchvision==0.14.1

2. Data Preparation

ImageNet with the following folder structure, you can extract ImageNet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

3. Main Results on ImageNet with Pretrained Models

Models Input Size FLOPs (G) Params (M) Top-1 Acc.(%) Download
TransXNet-T 224x224 1.8 12.8 81.6 model
TransXNet-S 224x224 4.5 26.9 83.8 model
TransXNet-B 224x224 8.3 48.0 84.6 model

4. Train

To train TransXNet models on ImageNet-1K with 8 gpus (single node), run:

bash scripts/train_tiny.sh # train TransXNet-T
bash scripts/train_small.sh # train TransXNet-S
bash scripts/train_base.sh # train TransXNet-B

5. Validation

To evaluate TransXNet on ImageNet-1K, run:

MODEL=transxnet_t # transxnet_{t, s, b}
python3 validate.py \
/path/to/imagenet \
--model $MODEL -b 128 \
--pretrained # or --checkpoint /path/to/checkpoint 

Object Detection and Semantic Segmentation

Object Detection
Semantic Segmentation

Citation

If you find this project useful for your research, please consider citing:

@article{lou2023transxnet,
  title={TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition},
  author={Lou, Meng and Zhou, Hong-Yu and Yang, Sibei and Yu, Yizhou},
  journal={arXiv preprint arXiv:2310.19380},
  year={2023}
}

Acknowledgment

Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

poolformer
pytorch-image-models
mmdetection
mmsegmentation

Contact

If you have any questions, please feel free to create issues or contact me at lmzmm.0921@gmail.com.