This repository contains code to compute depth from a single image. It accompanies our paper:
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun
and our preprint:
Vision Transformers for Dense Prediction
René Ranftl, Alexey Bochkovskiy, Vladlen Koltun
MiDaS was trained on 10 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS) with
multi-objective optimization.
The original model that was trained on 5 datasets (MIX 5
in the paper) can be found here.
- [Sep 2021] Integrated to Huggingface Spaces with Gradio. See Gradio Web Demo.
- [Apr 2021] Released MiDaS v3.0:
- New models based on Dense Prediction Transformers are on average 21% more accurate than MiDaS v2.1
- Additional models can be found here
- [Nov 2020] Released MiDaS v2.1:
- New model that was trained on 10 datasets and is on average about 10% more accurate than MiDaS v2.0
- New light-weight model that achieves real-time performance on mobile platforms.
- Sample applications for iOS and Android
- ROS package for easy deployment on robots
- [Jul 2020] Added TensorFlow and ONNX code. Added online demo.
- [Dec 2019] Released new version of MiDaS - the new model is significantly more accurate and robust
- [Jul 2019] Initial release of MiDaS (Link)
- Pick one or more models and download corresponding weights to the
weights
folder:
- For highest quality: dpt_large
- For moderately less quality, but better speed on CPU and slower GPUs: dpt_hybrid
- For real-time applications on resource-constrained devices: midas_v21_small
- Legacy convolutional model: midas_v21
-
Set up dependencies:
conda install pytorch torchvision opencv pip install timm
The code was tested with Python 3.7, PyTorch 1.8.0, OpenCV 4.5.1, and timm 0.4.5.
-
Place one or more input images in the folder
input
. -
Run the model:
python run.py --model_type dpt_large python run.py --model_type dpt_hybrid python run.py --model_type midas_v21_small python run.py --model_type midas_v21
-
The resulting inverse depth maps are written to the
output
folder.
-
Make sure you have installed Docker and the NVIDIA Docker runtime.
-
Build the Docker image:
docker build -t midas .
-
Run inference:
docker run --rm --gpus all -v $PWD/input:/opt/MiDaS/input -v $PWD/output:/opt/MiDaS/output midas
This command passes through all of your NVIDIA GPUs to the container, mounts the
input
andoutput
directories and then runs the inference.
The pretrained model is also available on PyTorch Hub
See README in the tf
subdirectory.
Currently only supports MiDaS v2.1. DPT-based models to be added.
See README in the mobile
subdirectory.
See README in the ros
subdirectory.
Currently only supports MiDaS v2.1. DPT-based models to be added.
Zero-shot error (the lower - the better) and speed (FPS):
Model | DIW, WHDR | Eth3d, AbsRel | Sintel, AbsRel | Kitti, δ>1.25 | NyuDepthV2, δ>1.25 | TUM, δ>1.25 | Speed, FPS |
---|---|---|---|---|---|---|---|
Small models: | iPhone 11 | ||||||
MiDaS v2 small | 0.1248 | 0.1550 | 0.3300 | 21.81 | 15.73 | 17.00 | 0.6 |
MiDaS v2.1 small URL | 0.1344 | 0.1344 | 0.3370 | 29.27 | 13.43 | 14.53 | 30 |
Big models: | GPU RTX 3090 | ||||||
MiDaS v2 large URL | 0.1246 | 0.1290 | 0.3270 | 23.90 | 9.55 | 14.29 | 51 |
MiDaS v2.1 large URL | 0.1295 | 0.1155 | 0.3285 | 16.08 | 8.71 | 12.51 | 51 |
MiDaS v3.0 DPT-Hybrid URL | 0.1106 | 0.0934 | 0.2741 | 11.56 | 8.69 | 10.89 | 46 |
MiDaS v3.0 DPT-Large URL | 0.1082 | 0.0888 | 0.2697 | 8.46 | 8.32 | 9.97 | 47 |
Please cite our paper if you use this code or any of the models:
@article{Ranftl2020,
author = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
title = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year = {2020},
}
If you use a DPT-based model, please also cite:
@article{Ranftl2021,
author = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},
title = {Vision Transformers for Dense Prediction},
journal = {ArXiv preprint},
year = {2021},
}
Our work builds on and uses code from timm. We'd like to thank the author for making these libraries available.
MIT License