Vladimir Korviakov, Denis Koposov
-- CUDA 12.1
-- PyTorch 2.3.1
-- torchvision 0.18.1
The example of training command:
pip install -r requirements.txt
bash scripts/run_neonext_imagenet_local.shThe example of inferemce command:
bash scripts/validate.shModels are defined in file ptvision/models/neonext/neonxet.py. Currently there are several models considered as best (but not finally):
- NeoNeXt-T
- NeoNeXt-S
- NeoNeXt-B
- NeoNeXt-L
(Where the number of NeoNeXt-X means nothing but the version).
Each model has different number of blocks in each stage and different number of channels.
The block of NeoNeXt looks similar to ConvNeXt, but NeoCell is used instead of the depthwise convolution.
Also NeoCells are used for down-sampling in stem, between the stages and after the final feature map.
NeoCell implementation can be found in the file ptvision/models/neonext/neonxet_utils.py: Otimized PyTorch implementation in C++ API of NeoCell functions can be found in file ptvision/models/neonext/csrc/neocell.cpp.
Given input of shape NxCxHxW NeoCell performs channel-wise matrix multiplications using two trainable matrices A and B (pair of matrices for each channel): Y=A*X*B.
All input channels are splitted to several groups of "channel" number (may be different for each group).
Each group is processed by matrices of the same size.
If "kernel" parameter is set, both A and B matrices are squared matrices of size kernel and spatial size of the data is not changed.
If "h_in", "h_out", "w_in", "w_out" parameters are set, A has size h_out*h_in and B has size w_in*w_out and spatial size of the data can be changed (both increased or decreased).
If the "shift" is set (non zero) then all channels for this kernal are splitted to "kernel" sub-groups. And blocks in block-diagonal matrix in each next sub-group will be shifted by 1 in horizontal and vertical directions. The blocks are cycled and parts of kernels can be used in the lower-right and upper left corners of the block-diagonal matrix. The "shift" is supported only for squared matrices.
| Model | res | #params | GFLOPs | acc@1 |
|---|---|---|---|---|
| NeoNeXt-T | 224 | 27.5M | 4.4 | 81.44 |
| NeoNeXt-S | 224 | 49.3M | 8.6 | 82.58 |
| NeoNeXt-B | 224 | 86.8 | 15.2 | 83.09 |
| NeoNeXt-T | 384 | 27.5M | 13.3 | 82.00 |
| NeoNeXt-S | 384 | 49.3M | 25.7 | 82.94 |
| NeoNeXt-B | 384 | 86.8 | 45.2 | 83.26 |
| NeoNeXt-L | 384 | 193.5 | TBD | 83.68 |
- Inference code
- Training code
- Checkpoints of pretrained models
- Latest tricks
- Update paper
@misc{korviakov2024neonext,
title={NeoNeXt: Novel neural network operator and architecture based on the patch-wise matrix multiplications},
author={Vladimir Korviakov and Denis Koposov},
year={2024},
eprint={2403.11251},
archivePrefix={arXiv},
primaryClass={cs.CV}
}