Skip to content

Latest commit

 

History

History
56 lines (38 loc) · 2.26 KB

File metadata and controls

56 lines (38 loc) · 2.26 KB

ViT for Monocular Depth Estimation

Vision Transformers-relative and absolute depth estimation

Usage

  1. Images can be manually transferred to input folder or be downloaded from DuckDuckGo API using the script:
python fetch_sample_images.py -i <search image> -u <no. of urls>
  1. Select one of the four models:
    • DPT_Large: Largest model
    • DPT_Hybrid
    • MiDaS
    • MiDaS_small
  2. Inference:
python inference.py -i ../input -o ../output -t DPT_Large
python inference.py -i ../input -o ../output -t DPT_Hybrid
python inference.py -i ../input -o ../output -t MiDaS
python inference.py -i ../input -o ../output -t MiDaS_small
  1. Absolute Depth Estimation

The models perform relative depth estimation. To approximately estimate absolute depth, method prescribed in Section-5 of the paper has been implemented using depth-alignnment. Also have a look at the following issues: #36, #37, #42, #63, #148, #171.

To perform absolute depth estimation, use the below script.

python inference.py -i ../input -o ../output -t <model_name> -a true
  1. Output
  • Results are saved in output folder in png format. Output for any random image can be visualized using the script:
python plot.py

NOTE:

Training script is not provided by the original authors, refer issue #43. The authors utilize the strategies proposed in the paper "Multi-Task Learning as Multi-Objective Optimization" for training on different datasets with different objectives. The authors have shared the loss function in pytorch code here