Kashu Yamazaki
·
Taisei Hanyu
·
Khoa Vo
·
Thang Pham
·
Minh Tran
Gianfranco Doretto
·
Anh Nguyen
·
Ngan Le
Paper | arXiv | Project Page
TL;DR: Open-Fusion builds an open-vocabulary 3D queryable scene from a sequence of posed RGB-D images in real-time.
- Ubuntu 20.04
- 10GB+ VRAM (~ 5 GB for SEEM and 2.5 GB ~ for TSDF) - for a large scene, it may require more memory
- Azure Kinect, Intel T265 (for real-world data)
Please build a Docker image from the Dockerfile. Do not forget to export the following environment variables (REGISTRY_NAME
and IMAGE_NAME
) as we use them in the tools/*.sh
scripts:
export REGISTRY_NAME=<your-registry-name>
export IMAGE_NAME=<your-image-name>
docker build -t $REGISTRY_NAME/$IMAGE_NAME -f docker/Dockerfile .
You can run the following script to download the ICL and Replica datasets:
bash tools/download.sh --data icl replica
This script will create a folder ./sample
and download the datasets into the folder.
For ScanNet, please follow the instructions in ScanNet. Once you have the dataset downloaded, you can run the following script to prepare the data (example for scene scene0001_00
):
python tools/prepare_scene.py --filename scene0001_00.sens --output_path sample/scannet/scene0001_00
Please download the pretrained weight for SEEM from here and put it in as openfusion/zoo/xdecoder_seem/checkpoints/seem_focall_v1.pt
.
You can run OpenFusion using tools/run.sh
as follows:
bash tools/run.sh --data $DATASET --scene $SCENE
Options:
--data
: dataset to use (e.g.,icl
)--scene
: scene to use (e.g.,kt0
)--frames
: number of frames to use (default: -1)--live
: run with live monitor (default: False)--stream
: run with data stream from camera server (default: False)
If you want to run OpenFusion with camera stream, please run the following command first on the machine with Azure Kinect and Intel T265 connected:
python deploy/server.py
Please refer to this for more details.
- SEEM: VLFM we used to extract region based features
- Open3D: GPU accelerated 3D library for the base TSDF implementation
If you find this work helpful, please consider citing our work as:
@inproceedings{yamazaki2024open,
title={Open-fusion: Real-time open-vocabulary 3d mapping and queryable scene representation},
author={Yamazaki, Kashu and Hanyu, Taisei and Vo, Khoa and Pham, Thang and Tran, Minh and Doretto, Gianfranco and Nguyen, Anh and Le, Ngan},
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
pages={9411--9417},
year={2024},
organization={IEEE}
}
Please create an issue on this repository for questions, comments and reporting bugs. Send an email to Kashu Yamazaki for other inquiries.