Monocular Depth Estimation via a Fully Convolutional Deep Neural Network, utilising Atrous Convolutions, with 3D Point Cloud Visualisation.
Generating depth maps, colloquially known as depth estimation, from a single monocular RGB image has long been known to be an ill-posed problem. Traditional depth estimation techniques involve inference from stereo RGB pairs, via depth cues, or through the use of laser based LIDAR sensors, which produce sparse or dense point clouds depending on the size or cost of the sensor. Most modern smartphones contain more than one image sensor; however, utilising these sensors for depth estimation is infeasible as smartphone vendors restrict access to one image sensor at a time. In other cases, the sensors are of varying quality and focal lengths, rendering them inadequate for the purpose of depth inference. Producing depth maps for monocular RGB images is a crucial task due to their use in Depth-of-Field (DoF) image processing, Augmented Reality (AR), and Simultaneous Localisation and Mapping (SLAM).
To tackle the above problem, we propose a fully convolutional DCNN approach to learning and generating depth maps from single RGB images, utilising Atrous Convolution layers and ASPP for semantic segmentation and feature pooling & extraction in a convolutional neural network, employing an encoder-decoder architecture. We also apply Bicubic upsampling convolutions to further boost depth estimation accuracy, while simplifying previously proposed architectures so as to improve on performance, taking into consideration the computational and accuracy constraints that plague prior efforts.
We are showcasing the results of our model using a 3D point cloud view, and have trained our model using a subset of NYUv2 dataset that contains RGB and depth map pairs, which were constructed using Kinect sensors in an unsupervised manner.
TensorFlow dataflow and differentiable programming library
Keras neural-network library
NumPy multi-dimensional arrays and matrices
PIL python imaging library
Pillow a fork of PIL
Scikit-learn various classification, regression and clustering algorithms
Scikit-image segmentation, transformations, color manipulation, filtering, morphology, feature detection
Open3D 3D rendering
OpenCV2 real-time computer vision library
Flask web server
OpenGL 3.5.5 or newer
IP Camera and a network connection
Training code and model withheld due to academic constraints.