This repository serves as a record of my academic journey in ENPM673 during the Spring of 2023. It includes my solutions and code submissions for all projects. Each project has its dedicated folder with accompanying documentation and resources.
The Perception course delves into Classic Computer Vision principles and fundamental deep learning techniques. The curriculum emphasizes enhancing autonomous systems like robots, self-driving cars, and smart cameras. Hands-on projects cover practical applications, such as lane detection and constructing 3D models from 2D images. The course aims to provide a comprehensive understanding of perception in autonomous systems, blending theoretical knowledge with practical skills.
-
Curve Fitting and Trend Analysis:
- Optimal trend line identification for a set of data points through curve fitting.
-
Image Feature Recognition:
- Recognition of key features in images, including corners, edges, and straight lines.
-
3D Object Estimation:
- Estimation of 3D information for objects based on their 2D images.
-
Object Motion Metrics:
- Calculation of motion metrics for objects, covering speed and direction using camera feeds.
-
Camera Pose Estimation:
- Conducting camera pose estimation for spatial understanding.
-
Basic Image-based Machine Learning:
- Application of fundamental machine learning techniques to image-related tasks.
The course structure includes four distinct projects, each outlined below.
- Click here access ENPM-673 Final Project .
Project 1: Object Tracking and Covariance Matrix, LS, TLS, and RANSAC implementaions for 3D Point Cloud
-
Object Tracking: Implemented ball tracking to follow the trajectory of a red ball thrown against a wall.
- Video captured using
cv2.VideoCapture
, and frames processed in a loop. - Color channels converted from BGR to HSV using
cv2.cvtColor
. - Red color channel isolated using
cv2.inRange
with specified upper and lower thresholds. - Pixel coordinates of the ball's center calculated by finding the mean of x and y coordinates.
- Best-fit curve determined using the least squares method for pixel coordinates.
- Least Squares Method: Utilized the least squares method to find the best-fit curve (parabola) by minimizing mean square error.
- Video captured using
-
Covariance Matrix, LS, TLS, and RANSAC for 3D Point Cloud: Explored methods for fitting surfaces to 3D point cloud data.
- Covariance Matrix and Surface Normal: Calculated covariance matrix and determined surface normal's direction and magnitude using eigenvalues and eigenvectors.
- Standard Least Squares Method for 3D Point Cloud: Applied standard least squares method to find the best-fit surface plane.
- Total Least Squares Method for 3D Point Cloud: Used total least squares method to find the best-fit plane by minimizing error orthogonal to the plane.
- RANSAC Method: Implemented RANSAC for robust surface fitting, handling outliers in the data.
- Observations and Interpretation of Results:
- Total least squares method outperformed least squares method, especially in noisy data.
- RANSAC demonstrated superior accuracy in generating models, especially with outlier rejection.
- Problems Encountered:
- Challenges in determining threshold limits for ball tracking.
- Issues with eigen vector assignment in the total least squares method.
- Error during RANSAC due to probability values resulting in a denominator of zero.
- Complexity of RANSAC algorithm required referencing multiple examples and increasing iterations to reduce fluctuations.
Project 2: Camera Pose Estimation and Image Stitching
-
Camera Pose Estimation using Homography
-
In this task, camera pose estimation was performed using homography on a video, involving the following steps:
-
Image Processing Pipeline:
- Read video frame by frame.
- Grayscale the image.
- Blur the image.
- Apply Thresholding to extract white color.
- Perform Canny edge detection.
- Use Hough transform algorithm on the frame.
- Find peaks in the Hough space.
- Draw lines corresponding to the Hough peaks.
- Find the intersections between the detected lines.
- Compute the homography matrix between the camera and the ground plane.
- Decompose the homography matrix to obtain rotation and translation.
-
Explanation and Results:
-
-
Image Stitching for Panoramic View
-
This task focused on stitching four images together to create a panoramic view:
-
Pipeline:
- Load the four input images.
- Convert images to grayscale.
- Extract features using ORB or SIFT.
- Match features using Brute-Force Matcher.
- Visualize matched features.
- Compute homographies between pairs of images.
- Combine images using computed homographies.
- Warp the second image onto the first using OpenCV.
- Repeat for the next pair until all four images are stitched.
- Save the final panoramic image.
-
Explanation and Results:
-
Problems Encountered:
- Determining Canny edge detection values.
- Difficulty in detecting edges without using built-in functions.
- Tricky aspects in finding camera rotation and translation.
- Challenges in stitching due to dimension mismatches and homography application.
-
Project 3: Camera Calibration
-
Camera Calibration: Mathematical Approach
-
Pipeline:
- Capture checkerboard images for calibration.
- Determine world coordinates of checkerboard corners and find corresponding image coordinates.
- Calculate camera parameters using the P matrix.
- Extract the Rotation Matrix and Translation vector from the P matrix.
- Find Reprojection error for each point.
-
Results:
-
Minimum number of matching points needed is 6 for mathematical calibration.
-
Mathematical formulation involves decomposing the P matrix and finding intrinsic matrix K, rotation matrix R, and translation vector T.
-
Intrinsic Matrix K:
[-6.7912331e + 01, -7.9392768e − 02, 3.3562042e + 01; 0, 6.7619034e + 01, 2.5845427e + 01; 0, 0, 4.1946620e − 02]
-
Projection matrix P:
[28.7364445 -1.75735415 -70.0687538 756.890519; -20.1369011 65.889012 -22.2140404 213.263797; -0.0277042391 -0.00259559759 -0.0313888009 1.00000000]
-
Rotation matrix R.
[-0.74948643 0.11452983 -0.65203758; 0.0453559 0.99149078 0.12202001; 0.66046418 0.06187859 -0.74830349]
-
Translation vector T:
[0.64862355; 0.30183152; 0.69751919; 0.04064735]
-
Reprojection errors:
[0.2856, 0.9726, 1.0361, 0.4541, 0.1909, 0.3190, 0.1959, 0.3083]
-
-
-
Camera Calibration: Practical Approach
-
The objective is to calibrate the camera using real-world images.
-
Pipeline:
- Read calibration images.
- Grayscale and resize images.
- Find corners using
cv2.findChessboardCorners()
. - Draw corners on images.
- Calibrate using
cv2.calibrateCamera()
to obtain intrinsic parameters. - Compute reprojection error for each image.
- Extract the camera matrix.
-
Results:
-
Corners detected in images, and reprojection errors:
-
Reprojection errors:
[0.1198, 0.2610, 0.4094, 0.5418, 0.2219, 0.3537, 0.0520, 0.2247, 0.4810, 0.4042, 0.4810, 0.5137, 0.4297]
-
Intrinsic Matrix K:
[2.2317e + 03, 0, 7.7812e + 02; 0, 2.4542e + 03, 1.3235e + 03; 0, 0, 1.0000]
-
-
- Determining correct K matrix in the mathematical approach.
- Handling very low values in the K matrix.
Project 4: Pipeline for Stereo Vision and Depth Perception
- The fourth project in my perception course involved addressing four sub-tasks, each contributing to the overall goal of stereo vision:
-
Calibration Pipeline :
- Utilized ORB feature extraction to find matching features in stereo images.
- Estimated the Fundamental matrix and Essential matrix, considering camera intrinsics.
- Decomposed Essential matrix into translation and rotation.
-
Rectification Pipeline :
- Applied perspective transformation to rectify stereo images for easier comparison.
- Computed homography matrices to map original to rectified image coordinates.
- Visualized rectification effects through epipolar lines and feature points overlay.
-
Correspondence Pipeline :
- Implemented a correspondence pipeline involving matching windows and disparity calculation.
- Generated grayscale and color heat maps for visualizing disparity.
-
Image Depth Computation Pipeline :
- Calculated depth values from a disparity map, considering camera calibration parameters.
- Produced grayscale and color heat maps for depth visualization.
The pipelines were applied to three datasets, yielding specific outcomes for each room:
- Chess Room :
- Ladder Room :
- Art Room :
-
Calibration Outliers :
- Difficulty in removing outliers during camera calibration.
- Tricky estimation of the Fundamental matrix.
-
Rectification Issues :
- Inability to achieve horizontal epipolar lines during rectification.
- Warping difficulties.
-
Correspondence Challenges :
- Issues arising from problems in the previous processes.
- Formulaic challenges in implementing correspondence.