Skip to content
/ SLAM Public

Simultaneous Localisation and Mapping (SLAM). Project involves application of basic methods to determine unknown camera poses.

Notifications You must be signed in to change notification settings

vagizD/SLAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SLAM

Simple Monocular SLAM using OpenCV and OpenCV Contrib.

Description

Simultaneous Localisation and Mapping (SLAM) is a complex task in Robotics & Computer Vision. It solves situations, for example, of a robot vacuum cleaner building a map of unknown surface. The robot software must be able to reconstruct the 3D scene of an environment from 2D pictures (there could be various artifacts - depth maps, heatmaps, sensors data, etc.).

The most classic SLAM variation works with a single camera. The task is to build a 3D scene (say, point cloud) from the pictures made with a single camera, which moves inside some environment.

Contribution

The dataset and formulation were prepared by YSDA faculty, with support of Timur Ibadov and Ivan Malin (from GeoCV), and Alexander Velizhnev (a member of IBM Research Center in Zurich, velizhev@gmail.com).

Used data from TU Munich RGB-D SLAM Dataset.

Task

In order to understand the underlying concepts of SLAM, there is a study-oriented project to build up a simple version of SLAM, using pictures from a single camera. About half of the images provided have camera coordinates and rotation matrices (i.e, we know where camera was when taking a specific shot). The task is to determine the coordinates and rotation matrices for the rest of the images. Camera intrinsics are given. Support images data is provided with some noise to make the problem closer to real world situations.

Implementation

The images with known poses are called support images, the images to determine the poses - unknown images. Let's mark support images as $S = {s_i}{i=1}^{n}$, unknown images as $U = {u_i}{i=1}^{m}$, and all images as $X = S \cup U = {x_i}_{i=1}^{n + m}$. The solution is as follows.

  1. For each image $x_i \in X$ find a set of keypoints $K(x_i)$. Then, using an ORB algorithm, which is an alternative to SIFT, find descriptors $D_i = D(K(x_i))$.
  2. For each pair of support images ${s_i, s_j}, i < j$:
    1. Match descriptors of images in descriptor space.
    2. Use RANSAC to filter wrong inliers (inlier - the same point across images).
    3. Save all inliers for a given pair of support images, $I({D_i, D_j}) = I_{ij}$.
  3. Build tracks between images. A track is a sequence of inliers of the same point across pairs of images.
  4. Triangulate 3D points for each track. Here, poses of each support image are used. After this step, we have a cloud of 3D points.
  5. Filter noisy 3D points. Each 3D point in a cloud is reprojected back to images where it was found. Reprojection error is computed for each image, and if the maximum error is above threshold, the point (and the whole track) is discarded from the cloud.
  6. For each unknown image $u_i \in U$:
    1. Match descriptors of $u_i$ with $s_j \in S \forall j = 1, ..., n$. The same procedure as in step $2$, but with inliers from support images. 3D and 2D points correspondences are sufficient to solve PnP equation system and find poses of $u_i$.

Examples

Examples of images:

Shot 1

Shot 2

Results

Some statistics of the process:

# of images               : 100
# of support images       : 50
Keypoints per image       : 500.00
# of support pairs        : 1225
# of pairs with inliers   : 535
Inliers per support pair  : 47.64
Tracks found              : 9289
Tracks per inliers pair   : 17.36
Scene points found        : 9219
Scene points per track    : 0.99
# of pairs with inliers2  : 774
Inliers2 per support pair : 44.64

Keypoints per image is a hyperparameter of descriptors matcher. Overall, the algorithms found $9,289$ tracks. This assymptotics is expected since my realisation iterates over a subset of all tracks of length $3$. Particularly, the subset is defined as all pairs ${i, j}$, where $i = 1, ..., n$, and $j = 1, ..., n - 2$. The third image in the track has index $j + 2$. The choice of indexation is arbitrary. This method does not build the longest tracks, but allows a simple control over the asymptotic of tracks iteration. Thus, the algorithm could be adjusted for a time complexity.

Obtained metrics are as follows:

tr 0.03 rot 1.38

These are maximum translation error (tr) and maximum rotation error (rot). Each error is computed as $||A - A'||$, where $A$ is ground truth matrix (of translation or rotation) and $A'$ is corresponding matrix, predicted by the algorithm.

The metrics show very small difference between ground truth and predicted matrices, approving that the algorithm is capable of solving the task.

About

Simultaneous Localisation and Mapping (SLAM). Project involves application of basic methods to determine unknown camera poses.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published