Summary includes basic segmentation, human segmentation, human or portrait matting for both image and video. Maybe it is a little chaos, so I called it Segmentation-Series-Chaos. If you want a clear understanding, feel free to fork and modify.
-
[done] matting in detail
-
[done] Focus on Deeplab-research
-
[doing] experiments
Summary of 2019 Survey on semantic segmentation using deep learning techniques_Neurocomputing and other useful sights
model/year | para | infer time (ms) | flops | accuracy (VOC2012 /COCO /Cityscapes : %) | paper | code | more |
---|---|---|---|---|---|---|---|
FCN-8s/2015 | ~134M | 175 | - | 67.20/-/65.30 | Fully Convolutional Networks for Semantic Segmentation | https://github.com/shelhamer/fcn.berkeleyvision.org | Begin of FCN for seg, arbitrary input size |
PSPNet/2017 | 65.7M | - | - | 85.40/-/80.20 | Pyramid Scene Parsing Network | https://github.com/hszhao/PSPNet | multi-scale feature ensembling, pyramid pooling module |
DeepLab V3-JFT more pre-trained JFT-300/2017 | 86.9/-/- | Rethinking Atrous Convolution for Semantic Image Segmentation | https://github.com/rishizek/tensorflow-deeplab-v3 | ~ | |||
DeepLab V3/2017 | 85.7/-/81.3 | Rethinking Atrous Convolution for Semantic Image Segmentation | https://github.com/rishizek/tensorflow-deeplab-v3 | Fully connected conditional random fields (CRF), | |||
DeepLab V3+Xception/2018 | 87.8/-/82.1 | Encoder-decoder with atrous separable convolution for semantic image segmentation | https://github.com/fyu/dilation | backbone Xception, encoded-decond based V3, apply depth-wise conv | |||
DeepLab V3+Xception-JFT/2018 | 89.0/-/- | Encoder-decoder with atrous separable convolution for semantic image segmentation | https://github.com/fyu/dilation | ~ | |||
ESPNet/2018 | 0.364M | 63.01/-/60.2 | SPNet-Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation | https://github.com/sacmehta/ESPNet | point-wise convo (reduce the complexity) , spatial pyramid of dilated conv (provid large receptive field),Hierarchical feature fusion (HFF) | ||
FC-DRN-P-D + ST/2018 | 3.9M | CamVid:69.4 | On the iterative refinement of densely connected representation levels for semantic segmentation | https://github.com/ArantxaCasanova/fc-drn | Combine FC-ResNet and FC-DenseNet | ||
ERFNet/2018 | ~ 2.1M | 24 | -/-/69.7 | ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation | https://github.com/Eromera/erfnet | bottleneck-1D (non-bt-1D) layer and combines with bottleneck designs in a way that best leverages their learning performance and efficiency | |
RefineNet/2017 | 83.40/-/73.60 | RefineNet-Multi-Path Refinement Networks for High-Resolution Semantic Segmentation | https://github.com/guosheng/refinenet | Residual conv unit (RCU), Multi-resolution fusion and Chained residual pooling, Muti-path net refines low-resolution features with concentrated low-level features in a recursive manner | |||
FastFCN/2019 | Pascal Context: 53.1, ADE20K: 44.34 | FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation | https://github.com/wuhuikai/FastFCN | Joint Pyramid Upsampling (JPU) | |||
Fast-SCNN/2019 | 1.11M | -/-/68.0 | Fast-SCNN: Fast Semantic Segmentation Network | https://github.com/kshitizrimal/Fast-SCNN | mobileNetv2, learn to downsample module, depth-wise conv | ||
efficient | 3.05G 640 × 360 × 3 | -/-/70.33 | An efficient solution for semantic segmentation: ShuffleNet V2 with atrous separable convolutions | https://github.com/sercant/mobile-segmentation | TF-lite applied, shuffleNetv2 as feature extraction, deeplabv3 as encode, (mobileNetv2) DPC |
- An efficient solution for semantic segmentation: ShuffleNet V2 with atrous separable convolutions
-
In term of VOC and cityscapes, deeplab V3/V3+ is the best from the related leaderboarder: VOC2012 , Cityspaces and https://paperswithcode.com/task/semantic-segmentation
-
Good advice of mobile devices: less than 2 GFLOPs from AI in RTC challenge group.
-
Google‘s solution in Mobile Real-time Video Segmentation
from 视频分割在移动端的算法进展综述 * includeing some other method
-
Greate tools for implementing segmentation model easily : Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!
-
National University of Singapore and Best Student Paper Award at ACM MM 2018 about multi-human-parsing Official Repository for Multi-Human-Parsing (MHP)
-
Similar project in GitHub about human segmetation: Human-Segmentation-PyTorch
-
A nearest project&paper produced by Alimama called Semantic_Human_Matting (SHM) paper in ACMMM. SHM is the first algorithm that learns to jointly fit both semantic information and high quality details with deep networks. (alpha matte)
And one of the human matting datasets: Human Matting datasets
And another useful repo for mobile devices with NCNN tool: And mobile_phone_human_matting (including datasets )
Another latest or s-o-t-a paper in matting:
- A Late Fusion CNN for Digital Matting, CVPR2019.
- Inductive Guided Filter: Real-time Deep Image Matting with Weakly Annotated Masks on Mobile Device, arXiv 2019.
- 2016_Automatic Portrait Segmentation for Image Stylization_CGF
- 2017_Deep Image Matting_CVPR
- 2017_Fast Deep Matting for Portrait Animation on Mobile Phone_ACMMM, github-pytorch
-
The largest and popular collection of semantic segmentation: awesome-semantic-segmentation which includes many useful resources e.g. architecture, benchmark, datasets, results of related challenge, projects et.al.
-
A blog conclusion about image semantic segmentation Review of Deep Learning Algorithms for Image Semantic Segmentation
-
Latested lightweight model maybe useful: mobileNetV3 (First Submitted on 6 May 2019) and efficientNet (First Submitted on 28 May 2019) using NAS (Neural Architectures Search) techs.
-
An useful algorithm CVPR2019 about how to use knowledge distillation to improve accuracy of lightweight semantic segmentation models without increasing the params size and GFlops: Structured Knowledge Distillation for Semantic Segmentation proposed by microsoft research asia.
-
New upsampling method called DUpsample: the W can be learned and a speciall feature fusion tech like inverted fusion decreases the compuation greatly. It outperform deeplabv3+ but only 30% computation. Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation CVPR2019
-
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
-
Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network
model | para | Infer time (ms) | GFlops | accuracy (VOC2012 /COCO /Cityscapes %) | paper | code | more |
---|---|---|---|---|---|---|---|
DFANet | 7.8M | 10 | 3.4G (input 1024x 1024) | -/-/71.3 CamVid: 64.7 | DFANet:Deep Feature Aggregation for Real-Time Semantic Segmentation | https://github.com/Tramac/awesome-semantic-segmentation-pytorch | Proposed by Beijing Megvii Co., Ltd, deep feature aggregation |
Auto-DeepLab | 44.42M | 85.6/-/82.1 | Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation | https://github.com/tensorflow/models/tree/master/research/deeplab | NAS, less computa-tion than deeplap, Li feifei, TensorFLow applied, oral | ||
ESPnetV2 | ~ 6M | 68.0/-/66.2 | Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network | https://github.com/sacmehta/ESPNetv2 | ESPNet (ECCV 2018), group conv to reduce dimension, depth-wise separable atrous conv | ||
Improving | -/-/83.5 CamVid: 81.7 | Improving Semantic Segmentation via Video Propagation and Label Relaxation | https://nv-adlr.github.io/publication/2018-Segmentation | video ,oral, a video predict method to enhance seg | |||