sudo apt install libsparsehash-dev
conda env create -f environment.yaml
conda activate visfusion
We use the same input data structure as NeuralRecon. You could download and extract ScanNet v2 dataset by following the instructions provided at http://www.scan-net.org/ or the scannet_wrangling_scripts provided by SimpleRecon.
Expected directory structure of ScanNet:
DATAROOT
└───scannet
│ └───scans
│ | └───scene0000_00
│ | └───color
│ | │ │ 0.jpg
│ | │ │ 1.jpg
│ | │ │ ...
│ | │ ...
│ └───scans_test
│ | └───scene0707_00
│ | └───color
│ | │ │ 0.jpg
│ | │ │ 1.jpg
│ | │ │ ...
│ | │ ...
| └───scannetv2_test.txt
| └───scannetv2_train.txt
| └───scannetv2_val.txt
Then generate the input fragments and the ground truth TSDFs for the training/val data split by
python tools/tsdf_fusion/generate_gt.py --data_path PATH_TO_SCANNET \
--save_name all_tsdf_9 \
--window_size 9
and for the test split by
python tools/tsdf_fusion/generate_gt.py --test \
--data_path PATH_TO_SCANNET \
--save_name all_tsdf_9 \
--window_size 9
We provide an example ScanNet scene (scene0785_00) to quickly try out the code. Download it from here and unzip it into the main directory of the project code.
The reconstructed meshes will be saved to PROJECT_PATH/results
.
python main.py --cfg ./config/test.yaml \
SCENE scene0785_00 \
TEST.PATH ./example_data/ScanNet \
LOGDIR: ./checkpoints \
LOADCKPT pretrained/model_000049.ckpt
By default, it will output double layer meshes (for NeuralRecon's evaluation). Set MODEL.SINGLE_LAYER_MESH=True
to directly output single layer meshes for TransformerFusion's evaluation.
python main.py --cfg ./config/test.yaml \
SCENE scene0785_00 \
TEST.PATH ./example_data/ScanNet \
LOGDIR: ./checkpoints \
LOADCKPT pretrained/model_000049.ckpt \
MODEL.SINGLE_LAYER_MESH True
Change TRAIN.PATH
to your own data path in config/train.yaml
and start training by running ./train.sh
.
train.sh:
#!/usr/bin/env bash
export CUDA_VISIBLE_DEVICES=0
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 20 MODEL.FUSION.FUSION_ON False
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 41
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 44 TRAIN.FINETUNE_LAYER 0 MODEL.PASS_LAYERS 0
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 47 TRAIN.FINETUNE_LAYER 1 MODEL.PASS_LAYERS 1
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 50 TRAIN.FINETUNE_LAYER 2 MODEL.PASS_LAYERS 2
The training is seperated to five phases:
-
Phase 1 (epoch 1 - 20), train single fragments.
MODEL.FUSION.FUSION_ON=False
-
Phase 2 (epoch 21 - 41), train the whole model with GRUFusion.
-
Phase 3 (epoch 42 - 44), finetune the first layer with GRUFusion.
TRAIN.FINETUNE_LAYER=0
,MODEL.PASS_LAYERS=0
-
Phase 4 (epoch 45 - 47), finetune the second layer with GRUFusion.
TRAIN.FINETUNE_LAYER=1
,MODEL.PASS_LAYERS=1
-
Phase 5 (epoch 48 - 50), finetune the third layer with GRUFusion.
TRAIN.FINETUNE_LAYER=2
,MODEL.PASS_LAYERS=2
Change TEST.PATH
to your own data path in config/test.yaml
and start testing by running
python main.py --cfg ./config/test.yaml
We use NeuralRecon's evaluation for our main results.
python tools/evaluation.py --model ./results/scene_scannet_checkpoints_fusion_eval_49 --n_proc 16
You could print previous evaluation results by
python tools/visualize_metrics.py --model ./results/scene_scannet_checkpoints_fusion_eval_49
Here is the 3D metrics on ScanNet generated by the provided checkpoint using NeuralRecon's evaluation:
Acc ↓ | Comp ↓ | Chamfer ↓ | Prec ↑ | Recall ↑ | F-Score↑ |
---|---|---|---|---|---|
5.6 | 10.0 | 7.80 | 0.694 | 0.537 | 0.604 |
and using TransformerFusion's evaluation (set MODEL.SINGLE_LAYER_MESH=True
to output single layer meshes):
Acc ↓ | Comp ↓ | Chamfer ↓ | Prec ↑ | Recall ↑ | F-Score↑ |
---|---|---|---|---|---|
4.10 | 8.66 | 6.38 | 0.757 | 0.588 | 0.660 |
To try with your own data captured from ARKit, please refer to NeuralRecon's DEMO.md for more details.
python test_scene.py --cfg ./config/test_scene.yaml \
DATASET ARKit \
TEST.PATH ./example_data/ARKit_scan \
LOADCKPT pretrained/model_000049.ckpt
If you find our work useful in your research please consider citing our paper:
@inproceedings{gao2023visfusion,
title={VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos},
author={Gao, Huiyu and Mao, Wei and Liu, Miaomiao},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={17317--17326},
year={2023}
}
This repository is partly based on the repo NeuralRecon. Many thanks to Jiaming Sun for the great code!