Let us start with a simple definition: Predicting the location of the object along with the class is called object Detection. In place of predicting the class of object from an image, we now have to predict the class as well as a rectangle(called bounding box) containing that object. It takes 4 variables to uniquely identify a rectangle. Object Detection is modeled as a classification problem where we take windows of fixed sizes from input image at all the possible locations feed these patches to an image classifier. Each window is fed to the classifier which predicts the class of the object in the window( or background if none is present). There are various methods for object detection like RCNN, Faster-RCNN, SSD, YOLO etc.
This is one of the most powerful to date algorithms for computer vision developed by P. Viola and M. Joens. This algorithm lies at the foundation of OpenCV library. check the version
pkg-config --modversion opencv
3.2.0
in case if it is not found, try sudo apt-get install libopencv-devsudo
. For this virtual-environment I have python3.6
. All the libraries and dependencies verion can be find in environment.yml. Main libraries installed with:
$ pip install torchvision==0.1.6
$ pip3 install torch==0.3.1
$ conda install -c menpo opencv3
Many companies today use CV in their core business to detect emotions. For example, Apple bought Emotient, a startup that builds CV tools to recognize people's feelings. Building an AI that sees human emotions can be highly valuable in some markets, like recomender system or self-driving car. Here is an example to detect one motion: Happiness :)
Additional reading:
-
Paul Viola & Michael Jones, 2001 Rapid bject Detection using a Boosted Cascade of Simple Features
-
Kinh Tieu & Paul Viola, 2000 Boosting Image Retrieval
Single Shot Detector achieves a good balance between speed and accuracy. SSD runs a convolutional network on input image only once and calculates a feature map. Now, we run a small 3×3 sized convolutional kernel on this feature map to predict the bounding boxes and classification probability. SSD also uses anchor boxes at various aspect ratio similar to Faster-RCNN and learns the off-set rather than learning the box. In order to handle the scale, SSD predicts bounding boxes after multiple convolutional layers. Since each convolutional layer operates at a different scale, it is able to detect objects of various scales.
That’s a lot of algorithms. Which one should you use? Currently, Faster-RCNN is the choice if you are fanatic about the accuracy numbers. However, if you are strapped for computation(probably running it on Nvidia Jetsons), SSD is a better recommendation. Finally, if accuracy is not too much of a concern but you want to go super fast, YOLO will be the way to go. First of all a visual understanding of speed vs accuracy trade-off:
Install the library imageio
pip install imageio
pip install imageio-ffmpeg
2.9.0
Pre-trained data set is available at VOC Dataset, PASCAL Visual Object Classes
Refrence Wei Liu et al., 2015 SSD: Single Shot MultiBox Detector
Generative Adversarial Network (GAN) can generate images from a learned latent space. A GAN is one of the simplest neural-based models that implements adversarial learning, and was initially conceived in a bar in Montreal by Ian Goodfellow and collaborators (Goodfellow, I., et al. (2014)). It is based on a min-max optimization problem. Here is example of deep Convolutional GAN https://towardsdatascience.com/understanding-generative-adversarial-networks-gans-cd6e4651a29
GANs can be used for:
- generating images
- image modification
- super resolution
- assisting asrtist
- speech generation
- face ageing
Additional reading:
- Chanchana Sornsoontorn, 2017 How do GANs intuitively work?
- Ian Goodfellow et al., 2014 Generative Adversarial Nets
- Matthew D. Zeiler et al., 2011 Adaptive Deconvolutional Networks for Mid and High Level Feature Learning