This repo contains PyTorch implementation for paper Memory-based Adapters for Online 3D Scene Perception based on MMDetection3D. Look here for 中文解读.
Memory-based Adapters for Online 3D Scene Perception
Xiuwei Xu*, Chong Xia*, Ziwei Wang, Linqing Zhao, Yueqi Duan, Jie Zhou, Jiwen Lu
We propose a model and task-agnostic plug-and-play module, which converts offline 3D scene perception models (receive reconstructed point clouds) to online perception models (receive streaming RGB-D videos).
- [2023/3/07]: Code released. Paper will be uploaded to Arxiv in next week.
- [2023/2/27]: Our paper is accepted by CVPR 2024.
Overall pipeline of our work:
Memory-based adapters can be easily inserted into existing architecture by a few lines in config:
model = dict(
type='SingleViewModel',
img_memory=dict(type='MultilevelImgMemory', ...),
memory=dict(type='MultilevelMemory', ...),
...)
For data preparation and environment setup:
For training,evaluation and visualization:
We provide the checkpoints for quick reproduction of the results reported in the paper.
3D semantic segmentation on ScanNet and SceneNN:
Method | Type | Dataset | mIou | mAcc | Downloads |
---|---|---|---|---|---|
MkNet | Offline | ScanNet | 71.6 | 80.4 | - |
MkNet-SV | Online | ScanNet | 68.8 | 77.7 | model |
MkNet-SV + Ours | Online | ScanNet | 72.7 | 84.1 | model |
MkNet-SV | Online | SceneNN | 48.4 | 61.2 | model |
MkNet-SV + Ours | Online | SceneNN | 56.7 | 70.1 | model |
3D object detection on ScanNet:
Method | Type | mAP@25 | mAP@50 | Downloads |
---|---|---|---|---|
FCAF3D | Offline | 70.7 | 56.0 | - |
FCAF3D-SV | Online | 41.9 | 20.6 | model |
FCAF3D-SV + Ours | Online | 70.5 | 49.9 | model |
3D instance segmentation on ScanNet:
Method | Type | mAP@25 | mAP@50 | Downloads |
---|---|---|---|---|
TD3D | Offline | 81.3 | 71.1 | - |
TD3D-SV | Online | 53.7 | 36.8 | model |
TD3D-SV + Ours | Online | 71.3 | 60.5 | model |
Visualization results:
If your GPU resources are limited, consider:
- Remove 2D modality (img_memory or the whole img_backbone). Note that in our 3D instance segmentation experiments, we remove img_memory to avoid OOM.
- Only insert adapters after high-level backbone features. We observe the higher the level, the better the performance of adapter, and the lower the resolution, the smaller the computation. For example, change:
img_memory=dict(type='MultilevelImgMemory', ada_layer=(0,1,2,3))
memory=dict(type='MultilevelMemory', vmp_layer=(0,1,2,3)),
To:
img_memory=dict(type='MultilevelImgMemory', ada_layer=(2,3))
memory=dict(type='MultilevelMemory', vmp_layer=(2,3)),
Then image and point cloud adapters will be only inserted after the highest two levels of features (for a four-level backbone).
We thank a lot for the flexible codebase of FCAF3D and valuable datasets provided by ScanNet and SceneNN.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{xu2024online,
title={Memory-based Adapters for Online 3D Scene Perception},
author={Xiuwei Xu and Chong Xia and Ziwei Wang and Linqing Zhao and Yueqi Duan and Jie Zhou and Jiwen Lu},
journal={arXiv preprint arXiv:2403.06974},
year={2024}
}