Vision Transformers-relative and absolute depth estimation
- Images can be manually transferred to
input
folder or be downloaded from DuckDuckGo API using the script:
python fetch_sample_images.py -i <search image> -u <no. of urls>
- Select one of the four models:
DPT_Large
: Largest modelDPT_Hybrid
MiDaS
MiDaS_small
- Inference:
python inference.py -i ../input -o ../output -t DPT_Large
python inference.py -i ../input -o ../output -t DPT_Hybrid
python inference.py -i ../input -o ../output -t MiDaS
python inference.py -i ../input -o ../output -t MiDaS_small
- Absolute Depth Estimation
The models perform relative depth estimation. To approximately estimate absolute depth, method prescribed in Section-5 of the paper has been implemented using depth-alignnment. Also have a look at the following issues: #36, #37, #42, #63, #148, #171.
To perform absolute depth estimation, use the below script.
python inference.py -i ../input -o ../output -t <model_name> -a true
- Output
- Results are saved in
output
folder in png format. Output for any random image can be visualized using the script:
python plot.py
Training script is not provided by the original authors, refer issue #43. The authors utilize the strategies proposed in the paper "Multi-Task Learning as Multi-Objective Optimization" for training on different datasets with different objectives. The authors have shared the loss function in pytorch code here