Vision Transformers for Dense Prediction

This repository contains code and models for the Master's Thesis "Estimación de Profundidad Online con Transformers Eficientes", which modifies the code published along the paper presented by Ranftl et al. to accelerate the inference speed of the Dense Prediction Transformers.

Abstract

Monocular depth estimation deals with the automatic recovery of an approximation of the dimension that is lost when projecting a three-dimensional scene into a two-dimensional image. This problem has an infinite number of geometric solutions, which makes it practically impossible to solve using traditional computer vision techniques. Nonetheless, Deep Learning techniques are capable of extracting different characteristics from the images that make it possible to approximate a solution. In this work this problem and the existing solutions are studied, especially those based on Transformers and supervised learning. In one of these solutions, a series of modifications and developments are carried out to reduce the size of the original model and multiply its inference speed by nearly five. Furthermore, an exhaustive study, both quantitative and qualitative, of the influence of the different modifications is included, evaluating the models in the KITTI dataset, oriented to autonomous driving.

Documentation

Documentation for this project can be found in the Appendix B of the Master's Thesis manuscript (ES).

Acknowledgements

This work obviously would not have been possible without the incredibly valuable contribution of the Vision Transformers for Dense Prediction paper and the implementations of efficient attention mechanisms from Phil Wang. Likewise, a huge thank you to the PyTorch community and Ross Wightman for his incredible work with timm.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
Docker		Docker
dpt		dpt
input		input
output_monodepth		output_monodepth
output_semseg		output_semseg
util		util
wandb_sweeps		wandb_sweeps
weights		weights
.gitignore		.gitignore
KITTIDataset.py		KITTIDataset.py
LICENSE		LICENSE
README.md		README.md
attention_complexity.py		attention_complexity.py
eval_with_pngs.py		eval_with_pngs.py
inference_speed.py		inference_speed.py
requirements.txt		requirements.txt
run_eval_with_pngs.sh		run_eval_with_pngs.sh
run_monodepth.py		run_monodepth.py
setup.py		setup.py
train.py		train.py
train_utils.py		train_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformers for Dense Prediction

Related links

Abstract

Documentation

Acknowledgements

License

About

Releases 1

Packages

Languages

License

guillesanbri/DPT

Folders and files

Latest commit

History

Repository files navigation

Vision Transformers for Dense Prediction

Related links

Abstract

Documentation

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages