[CVPR 2025] Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation

Abstract

Despite considerable progress in stereo depth estimation, omnidirectional imaging remains underexplored, mainly due to the lack of appropriate data. We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation, consisting of 40K frames from video sequences across diverse environments, including crowded indoor and outdoor scenes with diverse lighting conditions. Collected using two 360° cameras in a top-bottom setup and a LiDAR sensor, the dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images. Additionally, we provide an augmented training set with a significantly increased label density by using depth completion. We benchmark leading stereo depth estimation models for both standard and omnidirectional images. The results show that while recent stereo methods perform decently, a significant challenge persists in accurately estimating depth in omnidirectional imaging. To address this, we introduce necessary adaptations to stereo models, achieving improved performance.

News

[16/02/2025] Helvipad has been accepted to CVPR 2025! 🎉🎉
[CVPR Update – 16/03/2025] A small but important update has been applied to the dataset. If you have already downloaded it, please check the details on the HuggingFace Hub.

Dataset Structure

The dataset is organized into training, validation and testing subsets with the following structure:

helvipad/
├── train/
│   ├── depth_maps                # Depth maps generated from LiDAR data
│   ├── depth_maps_augmented      # Augmented depth maps using depth completion
│   ├── disparity_maps            # Disparity maps computed from depth maps
│   ├── disparity_maps_augmented  # Augmented disparity maps using depth completion
│   ├── images_top                # Top-camera RGB images
│   ├── images_bottom             # Bottom-camera RGB images
│   ├── LiDAR_pcd                 # Original LiDAR point cloud data
├── val/
│   ├── depth_maps                # Depth maps generated from LiDAR data
│   ├── depth_maps_augmented      # Augmented depth maps using depth completion
│   ├── disparity_maps            # Disparity maps computed from depth maps
│   ├── disparity_maps_augmented  # Augmented disparity maps using depth completion
│   ├── images_top                # Top-camera RGB images
│   ├── images_bottom             # Bottom-camera RGB images
│   ├── LiDAR_pcd                 # Original LiDAR point cloud data
├── test/
│   ├── depth_maps                # Depth maps generated from LiDAR data
│   ├── depth_maps_augmented      # Augmented depth maps using depth completion (only for computing LRCE)
│   ├── disparity_maps            # Disparity maps computed from depth maps
│   ├── disparity_maps_augmented  # Augmented disparity maps using depth completion (only for computing LRCE)
│   ├── images_top                # Top-camera RGB images
│   ├── images_bottom             # Bottom-camera RGB images
│   ├── LiDAR_pcd                 # Original LiDAR point cloud data

Benchmark

We evaluate the performance of multiple state-of-the-art and popular stereo matching methods, both for standard and 360° images. All models are trained on a single NVIDIA A100 GPU with the largest possible batch size to ensure comparable use of computational resources.

Method	Stereo Setting	Disp-MAE (°)	Disp-RMSE (°)	Disp-MARE	Depth-MAE (m)	Depth-RMSE (m)	Depth-MARE	Depth-LRCE (m)
PSMNet	conventional	0.286	0.496	0.248	2.509	5.673	0.176	1.809
360SD-Net	omnidirectional	0.224	0.419	0.191	2.122	5.077	0.152	0.904
IGEV-Stereo	conventional	0.225	0.423	0.172	1.860	4.447	0.146	1.203
360-IGEV-Stereo	omnidirectional	0.188	0.404	0.146	1.720	4.297	0.130	0.388

Download

The dataset is available on HuggingFace Hub.

Project Page

For more information, visualizations, and updates, visit the project page.

License

This dataset is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Acknowledgments

This work was supported by the EPFL Center for Imaging through a Collaborative Imaging Grant. We thank the VITA lab members for their valuable feedback, which helped to enhance the quality of this manuscript. We also express our gratitude to Dr. Simone Schaub-Meyer and Oliver Hahn for their insightful advice during the project's final stages.

Citation

If you use the Helvipad dataset in your research, please cite our paper:

@inproceedings{zayene2025helvipad,
  author    = {Zayene, Mehdi and Endres, Jannik and Havolli, Albias and Corbière, Charles and Cherkaoui, Salim and Ben Ahmed Kontouli, Alexandre and Alahi, Alexandre},
  title     = {Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR 2025] Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation

Abstract

News

Dataset Structure

Benchmark

Download

Project Page

License

Acknowledgments

Citation

About

Releases

Packages

Languages

License

vita-epfl/Helvipad

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2025] Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation

Abstract

News

Dataset Structure

Benchmark

Download

Project Page

License

Acknowledgments

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages