Depth estimation is a fundamental task in computer vision, essential for 3D scene understanding in applications like autonomous driving, robotics, augmented reality (AR), and medical imaging. Traditional methods, such as stereo vision and LiDAR, are often costly and computationally expensive. Deep learning approaches, while promising, still face challenges in accuracy, generalization, and efficiency.
This project aims to improve depth estimation using advanced deep learning techniques on the KITTI dataset. Develop and compare three different approaches:
- Pix2Pix GANs for direct depth map generation.
- CNN-based U-Net with Pix2Pix GAN refinement for enhanced depth accuracy.
- Pix2Pix GAN for stereo pair generation to improve depth perception.
- Input: Monocular RGB image.
- Architecture: Conditional GAN (Generator + Discriminator).
- Loss Function: L1 loss + Adversarial loss.
- Output: Predicted depth map.
- Training Details:
- Optimizer: Adam (LR = 0.0002).
- Batch Size: 16.
- Training Epochs: 100.
- Input: Monocular image.
- Architecture: U-Net for initial depth estimation, followed by GAN for refinement.
- Loss Function: RMSE + SSIM + Adversarial loss.
- Training Details:
- U-Net trained first, GAN applied as a post-processing step.
- Optimizer: Adam (LR = 0.0001).
- Batch Size: 32.
- Input: Single monocular image.
- Architecture: Pix2Pix GAN for generating right-eye view from a monocular image.
- Loss Function: L1 loss + Adversarial loss.
- Training Details:
- Optimizer: Adam (LR = 0.0002).
- Batch Size: 16.
- Post-processing: Stereo depth estimation.
We use the KITTI Optical Flow & Disparity Dataset, which includes:
- RGB Stereo Images: Left & right camera views for depth estimation.
- Disparity Maps: Ground truth depth information.
- Optical Flow Maps: Motion estimation between frames.
- Object Maps: Semantic segmentation for scene understanding.
- Image resizing and normalization.
- Data augmentation (if needed).
- Splitting into training and validation sets.
- RMSE (Root Mean Square Error) – Measures the standard deviation of errors.
- MAE (Mean Absolute Error) – Captures average absolute errors.
- SSIM (Structural Similarity Index) – Evaluates perceptual image quality.
- Autonomous Driving: Enhancing vehicle perception for safer navigation.
- Robotics: Improving robotic vision and spatial understanding.
- AR/VR: Enabling realistic depth-based interactions.
- Medical Imaging: Assisting in diagnostic depth analysis.
- KITTI Dataset: KITTI 2012/2015 Stereo Images
- Pix2Pix Depth Images: Source-Depth Image Pairs
- Clone the repository:
git clone https://github.com/your-repo/Deep-Vision-GAN.git
- Install dependencies:
pip install -r requirements.txt
- Prepare the dataset and place it in the required directory.
- Train the model:
python train.py --model pix2pix --dataset kitti
- Evaluate the model:
python evaluate.py --model pix2pix
- View results:
python visualize.py --model pix2pix