Welcome to the restructured codebase for ECoDepth, our official implementation for monocular depth estimation (MDE) as presented in our CVPR 2024 paper. This repository has been significantly reorganized to improve usability, readability, and extensibility.
Important: The original code used to generate the results in our paper is tagged as v1.0.0, which you can download from the Releases section. For most practical purposes—such as training on custom datasets or performing inference—we strongly recommend using the new v2.0.0 outlined here.
- [April 2024] Inference scripts for video or image to depth.
- [March 2024] Pretrained checkpoints for NYUv2 and KITTI datasets.
- [March 2024] Training and Evaluation code released!
- [Feb 2024] ECoDepth accepted in CVPR'2024.
- Overview of v2.0.0 Improvements
- Setup
- Dataset Download (NYU Depth V2)
- DepthDataset API
- EcoDepth Model API
- Training Workflow
- Testing Workflow
- Inference Workflow
- Citation
-
Integrated Model Downloading
- In the previous version (v1.0.0), you had to manually download our checkpoints from Google Drive and place them in the correct directory. Now, the model is automatically downloaded and cached on the first run in
EcoDepth/checkpoints
. Subsequent runs will use the cached checkpoints automatically.
- In the previous version (v1.0.0), you had to manually download our checkpoints from Google Drive and place them in the correct directory. Now, the model is automatically downloaded and cached on the first run in
-
Generic DepthDataset Module
- We provide a new, flexible
DepthDataset
module that loads any custom dataset for MDE training. This was a frequent feature request. Detailed usage is given in the DepthDataset API section.
- We provide a new, flexible
-
PyTorch Lightning Integration
- The
EcoDepth
model is now a subclass ofLightningModule
, allowing for streamlined training and inference workflows via PyTorch Lightning. This also makes it straightforward to export models to ONNX or TorchScript for production use.
- The
-
Config-Based Workflows
- We replaced bash scripts with user-friendly JSON configs, making it clearer to specify training, testing, and inference parameters.
-
Reduced Dependencies & Simplified Setup
- We removed the requirement to install the entire Stable Diffusion pipeline and numerous large CLIP or VIT models separately. Our checkpoints already contain the necessary weights, so only one model download is required.
- Dependencies like
mmcv
, which can be cumbersome to install, are no longer necessary. Installation is now simpler and more flexible.
-
Separate Workflows
- The code is structured into three main directories:
train/
for trainingtest/
for testinginfer/
for inference
- Each directory contains its own config files, making each workflow highly modular.
- The code is structured into three main directories:
-
Install PyTorch (with or without GPU support)
- Refer to the PyTorch installation guide for commands tailored to your environment.
- Example (with CUDA 12.4):
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.4 -c pytorch -c nvidia
-
Install python3 Dependencies
- From the repository’s root directory, run:
pip install -r requirements.txt
- We have not pinned specific versions to reduce potential conflicts. Let the dependency resolver pick suitable versions for your system.
- From the repository’s root directory, run:
-
(Optional) Download NYU Depth V2 Dataset
- If you plan to train on the NYU Depth V2 dataset, simply run:
bash download_nyu.sh
- This downloads and unzips the dataset from HF dataset
aradhye/nyu_depth_v2
into a directory namednyu_depth_v2
underdatasets
. The filenames are already provided as text files underfilenames/nyu_depth_v2
.
- If you plan to train on the NYU Depth V2 dataset, simply run:
If you want to replicate our NYU Depth V2 experiments:
- Run:
bash download_nyu.sh
- This script:
- Downloads NYU Depth V2 from aradhye/nyu_depth_v2.
- Unzips the dataset in
datasets/nyu_depth_v2/
. - Provides file lists in
filenames/nyu_depth_v2/
.
You can then set the corresponding paths in your JSON configs (see the Training Workflow section).
DepthDataset
is a generic dataset class designed for pairs of RGB images and depth maps. It requires an args
object (which can be a namespace or a dictionary) with the following attributes:
-
is_train
(bool)
Indicates whether the dataset is used for training (True
) or evaluation/testing (False
). Some augmentations (e.g., random cropping) are only applied in training mode. -
filenames_path
(str)
Path to a text file containing pairs of image and depth map paths, separated by a space. -
data_path
(str)
A directory path that is prepended to each filename fromfilenames_path
. Thus, the actual file loaded isdata_path + path_in_filenames
. -
depth_factor
(float)
Divides the raw depth values to convert them into meters. For NYU Depth V2,depth_factor=1000.0
; for KITTI,depth_factor=256.0
. -
do_random_crop
(bool)
Whether to perform random cropping on the image/depth pairs (only ifis_train=True
). Ifdo_random_crop
isTrue
, you must also set:crop_h
(int): Crop heightcrop_w
(int): Crop width
If images are smaller than
crop_h
×crop_w
, zero padding is applied first. -
use_cut_depth
(bool)
Whether to use CutDepth to reduce overfitting. We found it helpful for indoor datasets (e.g., NYU Depth V2) but not for outdoor datasets (e.g., KITTI). Only used during training.
EcoDepth
is implemented as a subclass of PyTorch Lightning’s LightningModule
. The constructor expects an args
object with these key attributes:
-
train_from_scratch
(bool)
Currently should always beFalse
. To train from scratch, you would need the base pretrained weights. Typically, you will finetune using our published checkpoints. -
eval_crop
(str)
Determines the evaluation cropping strategy. Possible values:"eigen"
: Used for NYU"garg"
: Used for KITTI"custom"
: Implement your own function inutils.py
and seteval_crop="custom"
."none"
: No cropping
-
no_of_classes
(int)
Number of scene classes for internal embeddings. For NYU (indoor model) use100
; for VKITTI (outdoor model) use200
. -
max_depth
(float)
Maximum depth value the model will predict. Typically:10.0
for indoor (NYU)80.0
for outdoor (KITTI)
-
Navigate to
train/
-
Edit
train_config.json
- Data arguments:
train_filenames_path
,train_data_path
,train_depth_factor
test_filenames_path
,test_data_path
,test_depth_factor
- Model arguments:
eval_crop
,no_of_classes
,max_depth
- Training arguments:
ckpt_path
: Path to a Lightning checkpoint (for finetuning/resuming). If this is an empty string, you must specifyscene="indoor"
or"outdoor"
, triggering automatic model download.epochs
: Total training epochsweight_decay
,lr
: Optimizer hyperparametersval_check_interval
: Validation frequency (in training steps)
- Data arguments:
-
Run Training
python3 train.py
PyTorch Lightning will handle checkpointing automatically (by default, in
train/lightning_logs/
).
- Navigate to
test/
- Edit
test_config.json
- Similar to training config, but no training arguments.
- Point
ckpt_path
to the checkpoint you want to evaluate, or leave empty if you want to use the provided models.
- Run Testing
This script reports evaluation metrics (e.g., RMSE, MAE, δ thresholds).
python3 test.py
There are two scripts provided in the infer/
directory: one for images (infer_image.py
) and one for videos (infer_video.py
).
-
Navigate to
infer/
-
Edit
image_config.json
- Key arguments:
image_path
: Path to a single image or a directory containing multiple images (recursively processed).outdir
: Output directory for predicted depth maps.resolution
: Scale factor for processing images (higher resolution => more GPU memory usage).flip_test
(bool): Whether to perform horizontal flip as a test-time augmentation.grayscale
(bool): Output in grayscale (iffalse
, uses a colorized depth map).pred_only
(bool): Whether to output only the depth map.
- Key arguments:
-
Run Image Inference
python3 infer_image.py
Results are written to
outdir
, preserving subdirectory structure relative toimage_path
.
- Edit
video_config.json
- Key arguments:
video_path
: Path to the video file.outdir
: Output directory for frames or depth predictions.vmax
: Depth values are clipped to this maximum.
- Key arguments:
- Run Video Inference
python3 infer_video.py
If you find ECoDepth helpful in your research or work, please cite our CVPR 2024 paper:
@InProceedings{Patni_2024_CVPR,
author = {Patni, Suraj and Agarwal, Aradhye and Arora, Chetan},
title = {ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {28285-28295}
}
Thank you for using ECoDepth!
For any questions or suggestions, feel free to open an issue. We hope this restructured codebase helps you train on custom datasets and perform fast, efficient inference.