This project implements PaGeR, a Computer Vision method for estimating geometry from monocular panoramic ERP images implemented in the paper Panorama Geometry Estimation using Single-Step Diffusion Models.
[](website here)
[
](paper here)
[
](dataset link here)
Team: Vukasin Bozic, Isidora Slavkovic, Dominik Narnhofer Nando Metzger, Denis Rozumny, Konrad Schindler, Nikolai Kalischek
We present PaGeR, a diffusion-based model for panoramic geometry reconstruction that extends monocular depth estimation to full 360Β° scenes. PaGeR is a one-step diffusion model trained directly in pixel space, capable of predicting high-resolution panoramic depth and surface normals with strong generalization to unseen environments. Leveraging advances in panorama generation and diffusion fine-tuning, PaGeR is trained on PanoInfinigen, a newly introduced synthetic dataset of indoor and outdoor scenes with metric depth and normals, producing coherent, metrically accurate geometry. It outperforms prior approaches across standard, few-shot, and zero-shot scenarios.
05-02-2026: Full training, inference, and evaluation code added, along with the arXiv paper, interactive demo and depth, metric depth and normals model checkpoints. Full dataset release coming soon.
There are several ways to interact with PaGeR:
-
Run the demo locally (requires a 24VRAM GPU) -> see instructions below.
-
Some interactive examples are also available at our project page:
-
Finally, local development instructions with this codebase are given below.
The code was tested on:
- Debian GNU/Linux 12, Python 3.10.16, PyTorch 2.2.0, and CUDA 12.1.
Clone the repository (requires git):
git clone https://github.com/prs-eth/PaGeR.git
cd PaGeRCreate the Conda environment and install the dependencies:
conda env create -f environment.yamlThe model checkpoints are hosted on Hugging Face:
- Depth: prs-eth/PaGeR-depth
- Metric Depth: prs-eth/PaGeR-metric-depth
- Normals: prs-eth/PaGeR-normals You can either download them automatically by specifying the HF checkpoint name in the arguments, or download them manually and load from a local path. If you choose the latter, please preserve the original folder structure, as in the Hugging Face repository.
For training, testing or evaluation, you would need to choose and download one or more of the following datasets:
For download instructions, terms of use, and dataset description, please refer to the webpages of the respective datasets. We provide the dataloaders for all of these datasets. You just need to choose the respective dataset in the config file or command line argument.
The easiest way to test PaGeR locally is to run the Gradio demo. Make sure you have installed the dependencies as described above, then run:
python app.pyNow you can test the model, explore interactive 3D visualizations on both provided examples and your own images, or download the results.
We use OmegaConf and argparse for configuration management in all our scripts and models. The parameters for running the script could be influenced by either setting it in the config script, or directly providing a parameter in the CLI. The latter will always take precedence. Note that the model loading parameters will always be loaded from a YAML config file stored along with the model checkpoint, and they won't be overwritten by the local config or CLI args.
If you want to test models in the regular inference regime
# Depth
python inference.py \
--configs "path/to/config" \
--checkpoint_path "path/to/checkpoint" \
--enable_xformers \
--data_path "path/to/dataset" \
--dataset "dataset-choice" \
--results_path "path/to/save/results" \
--pred_only \TODO: generate_point_cloud explanation
The behavior of the code can be customized in the following ways:
| Argument | Description |
|---|---|
config |
Path to the YAML configuration file. |
checkpoint_path |
Model checkpoint to load (local path or HuggingFace repo ID). |
results_path |
Output directory where predictions are saved. |
dataset |
Dataset to use (list given above). |
data_path |
Root directory of the dataset. |
scenes |
Scene type to use: indoor, outdoor, or both (if supported). |
img_report_frequency |
Save an example output image every N samples. |
pred_only |
Save only the prediction image (otherwise saves an RGB + prediction mosaic). |
generate_eval |
Save predictions as .npz files for later evaluation. |
enable_xformers |
Enable memory-efficient attention (recommended). |
In order to run evaluation of inference results of our (or some other) model with the standard set of depth estimation metrics
# Depth
python evaluation/depth_evaluation.py \
--pred_path "path/to/preds/folder" \
--data_path "path/to/dataset" \
--dataset "dataset-choice" \
--alignment_type "alignment-type-to-apply" \
--save_error_mapsEvaluation of the surface normals estimation could be done, similar to the PanoNormal paper, by running the following command:
# Normals
python evaluation/normals_estimation.py \
--pred_path "path/to/preds/folder" \
--data_path "path/to/dataset" \
--dataset "dataset-choice" \
--alignment_type "alignment-type-to-apply" \
--save_error_maps# Edges
python script/iid/run.py \
--checkpoint prs-eth/marigold-iid-appearance-v1-1 \
--denoise_steps 4 \
--ensemble_size 1 \
--input_rgb_dir input/in-the-wild_example \
--output_dir output/in-the-wild_exampleThe behavior of the code can be customized in the following ways:
| Argument | Type | Choices | Description |
|---|---|---|---|
--data_path |
str |
β | Root directory of the dataset containing ground-truth depth and metadata. |
--dataset |
str |
PanoInfinigen, Matterport3D360, Stanford2D3DS, Structured3D, Structured3D_ScannetPP |
Dataset to evaluate on. Use PanoInfinigen for the synthetic dataset. |
--pred_path |
str |
β | Directory containing the predicted depth maps to be evaluated. |
--alignment_type |
str |
metric, scale, scale_and_shift |
Alignment strategy applied between prediction and ground truth before evaluation. |
--save_error_maps |
flag |
β | If set, saves per-sample error maps during evaluation. |
--error_maps_saving_frequency |
int |
β | Frequency (in number of batches) at which error maps are saved. |
Install additional dependencies:
pip install -r requirements+.txt -r requirements.txtSet data directory variable (also needed in evaluation scripts) and download the evaluation datasets (depth, normals) into the corresponding subfolders:
export BASE_DATA_DIR=<YOUR_DATA_DIR> # Set target data directory
# Depth
wget -r -np -nH --cut-dirs=4 -R "index.html*" -P ${BASE_DATA_DIR} https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/
# Normals
wget -r -np -nH --cut-dirs=4 -R "index.html*" -P ${BASE_DATA_DIR} https://share.phys.ethz.ch/~pf/bingkedata/marigold/marigold_normals/evaluation_dataset.zip
unzip ${BASE_DATA_DIR}/evaluation_dataset.zip -d ${BASE_DATA_DIR}/
rm -f ${BASE_DATA_DIR}/evaluation_dataset.zipFor download instructions of the intrinsic image decomposition test data, please refer to iid-appearance instructions and iid-lighting instructions.
Run inference and evaluation scripts, for example:
# Depth
bash script/depth/eval/11_infer_nyu.sh # Run inference
bash script/depth/eval/12_eval_nyu.sh # Evaluate predictions# Normals
bash script/normals/eval/11_infer_scannet.sh # Run inference
bash script/normals/eval/12_eval_scannet.sh # Evaluate predictions# IID
bash script/iid/eval/11_infer_appearance_interiorverse.sh # Run inference
bash script/iid/eval/12_eval_appearance_interiorverse.sh # Evaluate predictions
bash script/iid/eval/21_infer_lighting_hypersim.sh # Run inference
bash script/iid/eval/22_eval_lighting_hypersim.sh # Evaluate predictions# Depth (the original CVPR version)
bash script/depth/eval_old/11_infer_nyu.sh # Run inference
bash script/depth/eval_old/12_eval_nyu.sh # Evaluate predictionsNote: although the seed has been set, the results might still be slightly different on different hardware.
Based on the previously created environment, install extended requirements:
pip install -r requirements++.txt -r requirements+.txt -r requirements.txtSet environment parameters for the data directory:
export BASE_DATA_DIR=YOUR_DATA_DIR # directory of training data
export BASE_CKPT_DIR=YOUR_CHECKPOINT_DIR # directory of pretrained checkpointDownload Stable Diffusion v2 checkpoint into ${BASE_CKPT_DIR} (backup link)
Depth
Prepare for Hypersim and Virtual KITTI 2 datasets and save into ${BASE_DATA_DIR}. Please refer to this README for Hypersim preprocessing.
Normals
Prepare for Hypersim, Interiorverse and Sintel datasets and save into ${BASE_DATA_DIR}. Please refer to this README for Hypersim preprocessing, this README for Interiorverse and this README for Sintel.
Intrinsic Image Decomposition
Appearance model: Prepare for Interiorverse dataset and save into ${BASE_DATA_DIR}. Please refer to this README for Interiorverse preprocessing.
Lighting model: Prepare for Hypersim dataset and save into ${BASE_DATA_DIR}. Please refer to this README for Hypersim preprocessing.
# Depth
python script/depth/train.py --config config/train_marigold_depth.yaml# Normals
python script/normals/train.py --config config/train_marigold_normals.yaml# IID (appearance model)
python script/iid/train.py --config config/train_marigold_iid_appearance.yaml
# IID (lighting model)
python script/iid/train.py --config config/train_marigold_iid_lighting.yamlResume from a checkpoint, e.g.:
# Depth
python script/depth/train.py --resume_run output/marigold_base/checkpoint/latest# Normals
python script/normals/train.py --resume_run output/train_marigold_normals/checkpoint/latest# IID (appearance model)
python script/iid/train.py --resume_run output/train_marigold_iid_appearance/checkpoint/latest
# IID (lighting model)
python script/iid/train.py --resume_run output/train_marigold_iid_lighting/checkpoint/latestOnly the U-Net and scheduler config are updated during training. They are saved in the training directory. To use the inference pipeline with your training result:
- replace
unetfolder in Marigold checkpoints with that in thecheckpointoutput folder. - replace the
scheduler/scheduler_config.jsonfile in Marigold checkpoints withcheckpoint/scheduler_config.jsongenerated during training. Then refer to this section for evaluation.
Note: Although random seeds have been set, the training result might be slightly different on different hardwares. It's recommended to train without interruption.
Please refer to this instruction.
Please cite our paper:
Put citations hereThis code of this work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).
The models are licensed under RAIL++-M License (as defined in the LICENSE-MODEL)
By downloading and using the code and model you agree to the terms in LICENSE and LICENSE-MODEL respectively.
This project builds upon and is inspired by the following repositories and works:
- Marigold-e2e-ft, based on paper Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think.
- Marigold, based on paper Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation.
We thank the authors and maintainers for making their code publicly available.
