This repository contains the code of using Swithable Normalization (SN) in semantic image segmentation, proposed by the paper "Differentiable Learning-to-Normalize via Switchable Normalization".
This is the implementations of the experiments presented in the above paper by using open-source semantic segmentation framework Scene Parsing on MIT ADE20K.
- 2018/9/26: The code and trained models of semantic segmentation on ADE20K by using SN are released !
- More results and models will be released soon.
You are encouraged to cite the following paper if you use SN in research or wish to refer to the baseline results.
@article{SwitchableNorm,
title={Differentiable Learning-to-Normalize via Switchable Normalization},
author={Ping Luo and Jiamin Ren and Zhanglin Peng},
journal={arXiv:1806.10779},
year={2018}
}
Use git to clone this repository:
git clone https://github.com/switchablenorms/SwitchNorm_Segmentation.git
The code is tested under the following configurations.
- Hardware: 1-8 GPUs (with at least 12G GPU memories)
- Software: CUDA 9.0, Python 3.6, PyTorch 0.4.0, tensorboardX
Please check the Environment, Training and Evaluation subsection in the repo Scene Parsing on MIT ADE20K for a quick start.
Download SN based ImageNet pretrained model and put them into the {repo_root}/pretrained_sn
.
The backbone models with SN pretrained on ImageNet are available in the format used by above Segmentation Framework and this repo.
- ResNet50v1+SN(8,2) [pretrained_SN(8,2)]
For more pretrained models with SN, please refer to the repo of switchablenorms/Switchable-Normalization.
The following script converts the model trained from Switchable-Normalization into a valid format used by the semantic segmentation codebase : ./pretrained_sn/convert_sn.py
usage: python -u convert_sn.py
NOTE: The paramater keys in pretrained model checkpoint must match the keys in backbone model EXACTLY. You should load the correct pretrained model according to your segmentation architechure.
- The training strategies of baseline models and sn-based models on ADE20K are same as Scene Parsing on MIT ADE20K.
- The training script with ResNet-50-sn backbone can be found here:
./scripts/train.sh
NOTE: The default architecture of this repo is Encoder: resnet50_dilated8
( resnetXX_dilatedYY: customized resnetXX with dilated convolutions, output feature map is 1/YY of input size, see DeepLab for more details ) and Decoder: c1_bilinear_deepsup
( 1 conv + bilinear upsample + deep supervision, see PSPNet for more details ).
Optional arguments (see full input arguments via ./train.py
):
--arch_encoder architecture of encode network
--arch_decoder architecture of decode network
--weights_encoder weights to finetune endoce network
--weights_decoder weights to finetune decode network
--list_train the list to load the training data
--root_dataset the path of the dataset
--batch_size_per_gpu input batch size
--start_epoch epoch to start training. (continue from a checkpoint loaded via weights_encoder & weights_decoder)
NOTE: In this repo, --start_epoch
allows the training to resume from the checkpoint loaded from --weights_encoder
and --weights_decoder
, which is generated in the training process automatically. If you want to train from scratch, you need to assign --start_epoch
as 1 and set --weights_encoder
and --weights_decoder
to the blank value.
- The evaluation script with ResNet-50-sn backbone can be found here :
./scripts/evaluate.sh
Optional arguments (see full input arguments via ./eval.py
):
--arch_encoder architecture of encode network
--arch_decoder architecture of decode network
--suffix which snapshot to load
--list_val the list to load the validation data
--root_dataset the path of the dataset
--imgSize list of input image sizes
--imgSize
enables single-scale or multi-scale inference. When --load_dir
is with the int
type, the single-scale inference will be started up. When --load_dir
is a int list
, the multi-scale test will be applied.
The experiment results are on the ADE20K validation set. MS test is short for multi-scale test. sync BN
indicates the mutli-GPU synchronization batch normalization. More results and models will be released soon.
Architecture | Norm | MS test | Mean IoU | Pixel Acc. | Overall Score | Download |
---|---|---|---|---|---|---|
ResNet50_dilated8 + c1_bilinear_deepsup | sync BN | no | 36.43 | 77.30 | 56.87 | encoder decoder |
ResNet50_dilated8 + c1_bilinear_deepsup | GN | no | 35.66 | 77.24 | 56.45 | encoder decoder |
ResNet50_dilated8 + c1_bilinear_deepsup | SN-(8,2) | no | 38.72 | 78.90 | 58.82 | encoder decoder |
ResNet50_dilated8 + c1_bilinear_deepsup | sync BN | yes | 37.69 | 78.29 | 57.99 | -- |
ResNet50_dilated8 + c1_bilinear_deepsup | GN | yes | 36.32 | 77.77 | 57.05 | -- |
ResNet50_dilated8 + c1_bilinear_deepsup | SN-(8,2) | yes | 39.21 | 79.20 | 59.21 | -- |
NOTE: For all settings in this repo, we employ ResNet as the backbone network, using the original 7×7 kernel size in the first convolution layer. This is different from the MIT framework , which adopts 3 convolution layers with the kernel size 3×3 at the bottom of the network. See ./models/resnet_v1_sn.py
for the details.