Computer Vision

Computer vision, machine vision and image processing are to process image in computer for different visual tasks from diverse perspectives. Computer vision focus on the visual representation includes images, graphics, animation and videos rather than the sounds, speech or text.

Visual recognition tasks, such as image classification, localization, and detection, are the core building blocks of many of these applications, and recent developments in Convolutional Neural Networks (CNNs) have led to outstanding performance in these state-of-the-art visual recognition tasks and systems. As a result, CNNs now form the crux of deep learning algorithms in computer vision. The Ancient Secret of Computer Vision covers standard techniques in image processing like filtering, edge detection, stereo, flow, etc. (old-school vision), as well as newer, machine-learning based computer vision, which is more comprehensive.

In this section, we will focus on more technological details of CNN architecures, training and the motivation.

See CV, NLP for the state-of-the-art methods.

https://www.openml.org/
https://www.jiqizhixin.com/sota
https://www.pyimagesearch.com/start-here/
https://deepai.org/
https://www.mlhub123.com/

https://sites.google.com/visipedia.org/index
https://github.com/Ewenwan/MVision/tree/master/CNN
Computational Vision at Caltech
SE(3) COMPUTER VISION GROUP AT CORNELL TECH
https://vcla.stat.ucla.edu/index.html
香港中文大学多媒体实验室
计算机视觉与遥感实验室 Computer Vision & Remote Sensing (CVRS) Lab@WHU
Center for Research in Computer Vision
Vision and Content Engineering Lab@ETZH
Computer Vision and Geometry Group
https://github.com/jbhuang0604/awesome-computer-vision
CS231n: Convolutional Neural Networks for Visual Recognition
Spring 2019 CS 543/ECE 549: Computer Vision
Elements of Geometric Computer Vision. by Andrea Fusiello
CVonline: The Evolving, Distributed, Non-Proprietary, On-Line Compendium of Computer Vision
An Introduction to Projective Geometry (for computer vision) by Stan Birchfield
Photogrammetric Computer Vision -- Statistics, Geometry, and Reconstruction
http://slazebni.cs.illinois.edu/spring17/lec01_cnn_architectures.pdf
https://blog.algorithmia.com/introduction-to-computer-vision/
https://hayo.io/computer-vision/
http://www.robots.ox.ac.uk/~vgg/research/text/
https://ieeexplore.ieee.org/document/8295029
http://www.vlfeat.org/matconvnet/
http://cvcl.mit.edu/aude.htm

Image acquisition	Image processing	Image analysis
Webcams & embedded cameras	Edge detection	3D scene mapping
Digital compact cameras & DSLR	Segmentation	Object recognition
Consumer 3D cameras	Classification	Object tracking
Laser range finders	Feature detection and matching	---

9 Applications of Deep Learning for Computer Vision
Deep Learning for Computer Vision Image Classification, Object Detection, and Face Recognition in Python
http://modelnet.cs.princeton.edu/

Image Classification / Recognition

Like other classification tasks, the feature engineering is the core and kernel of preprocessing. The predicted labels is always supposed to be attached to some specific features. As the fingerprint can be an biological identifier of its owner, the desired features are supposed to be sufficient to identify the differences between the ones with the same label.

https://cv-tricks.com/cnn/understand-resnet-alexnet-vgg-inception/
https://www.jeremyjordan.me/convnet-architectures/
http://www.tamaraberg.com/teaching/Spring_16/

LeNet

LeNet learns the parameters using error back-propagation. In another word, the optimization procedure of CNNs are based on gradient methods. This CNN model was successfully applied to recognize handwritten digits.

As shown in the above figure, it consists of convolution, subsampling, full connection and Gaussian connections. It is a typical historical architecture.

AlexNet

AlexNet was the winning entry in ILSVRC 2012. It solves the problem of image classification where the input is an image of one of 1000 different classes (e.g. cats, dogs etc.) and the output is a vector of 1000 numbers.

AlexNet consists of 5 Convolutional Layers and 3 Fully Connected Layers. Overlapping Max Pool layers are similar to the Max Pool layers, except the adjacent windows over which the max is computed overlap each other. An important feature of the AlexNet is the use of ReLU(Rectified Linear Unit) Nonlinearity.

https://engmrk.com/alexnet-implementation-using-keras/
http://gyxie.github.io/2016/09/21/%E6%B7%B1%E5%85%A5AlexNet/
Understanding AlexNet
ImageNet Classification with Deep Convolutional Neural Networks

VGG

The very deep ConvNets were the basis of our ImageNet ILSVRC-2014 submission, where our team (VGG) secured the first and the second places in the localisation and classification tasks respectively.

http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html
http://www.robots.ox.ac.uk/~vgg/research/very_deep/
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Inception

Deep Learning in the Trenches: Understanding Inception Network from Scratch
http://www.csc.kth.se/~roelof/deepdream/bvlc_googlenet.html
https://zh.d2l.ai/chapter_convolutional-neural-networks/googlenet.html

ResNet

ResNet is to solve the degeneration of deep neural network when the depth of layers of deep neural network increase. It is guessed that the nonlinearity of activation function makes it tough to learn (linear) identity transformation (Id(x)=x).

https://neurohive.io/en/popular-networks/resnet/
http://teleported.in/posts/decoding-resnet-architecture/
https://zh.d2l.ai/chapter_convolutional-neural-networks/resnet.html
Layer-Parallel Training of Deep Residual Neural Networks
Fast deep parallel residual network for accurate super resolution image processing
What Can ResNet Learn Efficiently, Going Beyond Kernels?
https://github.com/tomgoldstein/loss-landscape
CNN meets PDEs
Deep Neural Network motivated by PDEs
https://www.semanticscholar.org/author/Lars-Ruthotto/2557699
https://www.robots.ox.ac.uk/~vedaldi//research/visualization/visualization.html
https://www.graphcore.ai/posts/what-does-machine-learning-look-like
https://srdas.github.io/DLBook/ConvNets.html#visualizing-convnets

graphcore.ai

DenseNet

Densely Connected Convolutional Networks
Dive to Deep Learning

HRNet

The high-resolution network (HRNet) maintains high-resolution representations by connecting high-to-low resolution convolutions in parallel and strengthens high-resolution representations by repeatedly performing multi-scale fusions across parallel convolutions. We demonstrate the effectives on pixel-level classification, region-level classification, and image-level classification.

The HRNet turns out to be a strong repalcement of classification networks (e.g., ResNets, VGGNets) for visual recognition. We believe that the HRNet will become the new standard backbone.

Deep High-Resolution Representation Learning (HRNet)

Semantic Segmentation

http://cvlab.postech.ac.kr/research/deconvnet/
https://deeplearninganalytics.org/semantic-segmentation/
http://www.cs.toronto.edu/~tingwuwang/semantic_segmentation.pdf
http://wp.doc.ic.ac.uk/bglocker/project/semantic-imaging/
A 2017 Guide to Semantic Segmentation with Deep Learning
Semantic Image Segmentation Live Demo
Awesome Semantic Segmentation
https://arxiv.org/pdf/1711.05847.pdf

Object Detection

RCNN

RCNN, Fast RCNN, Faster RCNN 总结
https://www.cnblogs.com/skyfsm/p/6806246.html

YOLO

only look once (YOLO) is a state-of-the-art, real-time object detection system.

YOLO 1 到 YOLO 3
https://github.com/jwchoi384/Gaussian_YOLOv3
https://pjreddie.com/darknet/yolo/
https://pjreddie.com/darknet/
https://github.com/qqwweee/keras-yolo3
https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/

https://www.jeremyjordan.me/object-detection-one-stage/
https://github.com/amusi/awesome-object-detection
Object Detection 2015
https://blog.csdn.net/v_JULY_v/article/details/80170182

Object Tracking

https://cv-tricks.com/object-tracking/quick-guide-mdnet-goturn-rolo/

Pose Estimation

https://www.fritz.ai/features/pose-estimation.html
https://github.com/xinghaochen/awesome-hand-pose-estimation
https://github.com/cbsudux/awesome-human-pose-estimation
https://cv-tricks.com/pose-estimation/using-deep-learning-in-opencv/

Image Caption

https://cs.stanford.edu/people/karpathy/sfmltalk.pdf
https://www.ripublication.com/ijaer18/ijaerv13n9_102.pdf
Deep Visual-Semantic Alignments for Generating Image Descriptions
Automatic Image Captioning using Deep Learning (CNN and LSTM) in PyTorch
How to Develop a Deep Learning Photo Caption Generator from Scratch
Deep Learning Image Caption Generator
Conceptual Captions: A New Dataset and Challenge for Image Captioning Wednesday, September 5, 2018

Scene Understanding

Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framewor
6.870 Object Recognition and Scene Understanding Fall 2008
http://vladlen.info/projects/scene-understanding/
Holistic Scene Understanding
http://www.kalisteo.eu/en/thematic_vs.htm
https://ps.is.tue.mpg.de/research_fields/semantic-scene-understanding
ScanNet Indoor Scene Understanding Challenge CVPR 2019 Workshop, Long Beach, CA
Human-Centric Scene Understanding from Single View 360 Video
HoloVis - 3D Holistic Scene Understanding
Scene Recognition and Understanding
BlitzNet: A Real-Time Deep Network for Scene Understanding
L3ViSU: Lifelong Learning of Visual Scene Understanding

Optical Character Recognition

https://indic-ocr.github.io/
https://github.com/kba/awesome-ocr
https://github.com/pannous/tensorflow-ocr
https://zhuanlan.zhihu.com/p/65707543
https://github.com/tesseract-ocr/tesseract
https://github.com/Swift-AI/Swift-AI
https://github.com/wanghaisheng/awesome-ocr
https://github.com/chineseocr
http://tesseract-ocr.github.io/4.0.0/index.html
https://anyline.com/
https://github.com/vinayakkailas/Deeplearning-OCR
https://github.com/hs105/Deep-Learning-for-OCR
http://cs231n.stanford.edu/reports/2017/pdfs/810.pdf
https://handong1587.github.io/deep_learning/2015/10/09/ocr.html
STN-OCR: A single Neural Network for Text Detection and Text Recognition

Image Search

Modeling visual knowledge from large-scale data
Deep Convolutional Matching

Style Transfer

https://github.com/titu1994/Neural-Style-Transfer
https://pjreddie.com/darknet/nightmare/
https://deepdreamgenerator.com/
https://www.pyimagesearch.com/2018/08/27/neural-style-transfer-with-opencv/
https://reiinakano.github.io/2019/01/27/world-painters.html
https://www.andrewszot.com/blog/machine_learning/deep_learning/style_transfer
Convolutional Neural Network – Exploring the Effect of Hyperparameters and Structural Settings for Neural Style Transfe

Visualization /Interpretation of CNN

It has shown that

ImageNet trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies.

Deep Visualization
Interpretable Representation Learning for Visual Intelligence
IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS
2017 Workshop on Visualization for Deep Learning
Understanding Neural Networks Through Deep Visualization
vadl2017: Visual Analysis of Deep Learning
Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks
Interactive Visualizations for Deep Learning
https://www.zybuluo.com/lutingting/note/459569
https://cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf
卷积网络的可视化与可解释性（资料整理） - 陈博的文章 - 知乎
https://zhuanlan.zhihu.com/p/24833574
https://zhuanlan.zhihu.com/p/30403766
https://zhuanlan.zhihu.com/p/28054589
https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html
http://people.csail.mit.edu/bzhou/ppt/presentation_ICML_workshop.pdf
https://www.robots.ox.ac.uk/~vedaldi//research/visualization/visualization.html
https://www.graphcore.ai/posts/what-does-machine-learning-look-like
https://srdas.github.io/DLBook/ConvNets.html#visualizing-convnets
CNN meets PDEs
Deep Neural Network motivated by PDEs
https://www.semanticscholar.org/author/Lars-Ruthotto/2557699

graphcore.ai

Computer Graphics

https://github.com/alecjacobson/computer-graphics-csc418
http://graphics.cs.cmu.edu/courses/15-463/
https://github.com/ivansafrin/CS6533s
https://github.com/ericjang/awesome-graphics
CreativeAI: Deep Learning for Computer Graphics
https://github.com/smartgeometry-ucl/dl4g
https://graphics.stanford.edu/
https://github.com/arrayfire/arrayfire-python
https://ge.in.tum.de/research/
An extensible framework for fluid simulation
https://people.csail.mit.edu/tzumao/

http://kvfrans.com/coloring-and-shading-line-art-automatically-through-conditional-gans/

Deep Dream

Deep Learning for Computer Vision, Speech, and Language
Deep Learning Vision for Non-Vision Tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computer vision.md

Computer vision.md

Computer Vision

Image Classification / Recognition

LeNet

AlexNet

VGG

Inception

ResNet

Semantic Segmentation

Object Detection

RCNN

YOLO

Object Tracking

Pose Estimation

Image Caption

Scene Understanding

Optical Character Recognition

Image Search

Style Transfer

Visualization /Interpretation of CNN

Computer Graphics

Files

Computer vision.md

Latest commit

History

Computer vision.md

File metadata and controls

Computer Vision

Image Classification / Recognition

LeNet

AlexNet

VGG

Inception

ResNet

Semantic Segmentation

Object Detection

RCNN

YOLO

Object Tracking

Pose Estimation

Image Caption

Scene Understanding

Optical Character Recognition

Image Search

Style Transfer

Visualization /Interpretation of CNN

Computer Graphics