Introduction

In this repo, we provide a ros wrapper for lightweight yet powerful 3D object detection with TensorRT inference backend for real-time robotic applications.

It is effective and efficient, achieving 5 ms runtime and 85% 3D Car mAP@R40.
we chose IA-SSD as baseline since its high efficiency. Further, HAVSampler and GridBallQuery are adopted to gain 1000x faster than FPS and original BallQuery, respectively.
we implement TensorRT plugins for NMS postprocessing and some common-to-use operators of point-based point cloud detector, e.g., sampling, grouping, gather.

News

[2022/05/01]: We offer a faster version HAVSampler and reconstruct all plugins with our auto-declaration header. updates can be found in branch devel.
[2022/04/17]: We release the PyTorch models and ONNX export script. You can retrain or do some modified based our models.
[2022/04/14]: This repository implements GridBallQuery with a computational complexity of $\mathcal{O}(NK^3)$, instead of $\mathcal{O}(NM)$ of BallQuery.
[2022/04/08]: Support INT8 quantization and Profiler.

Build

we test on the platform:

ubuntu18.0 with GPU 2080Ti
python3.7
pytorch1.12
cuda11.0
cudnn8.4
tensorrt8.4.0

You should follow the official guidance to install the above dependencies at first, and then build this package.

export CUDNN_DIR=/path/to/cudnn/root
export TENSORRT_DIR=/path/to/tensorrt/root

mkdir -p build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DTRT_QUANTIZE=FP16 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
make -j$(nproc)

or build as normal ros package.

Test

We test exported model with TensorRT in KITTI val set and report the results AP_3D@R11/R40 as following:

iassd_hvcsx2_4x8_80e_kitti_3cls

	Car	Pedestrian	Cyclist	Runtime
FP32	83.8752 / 84.9749	53.9177 / 53.1046	67.2500 / 67.1609	10 ms
FP16	80.2896 / 80.8535	53.0247 / 51.4732	67.8503 / 68.3627	8 ms
INT8	77.7286 / 79.3178	52.2956 / 50.7517	68.3595 / 68.3880	9 ms

Unexpectedly, the runtime in INT8 mode is higher than that in FP16. This may be due to the fact that we did not implement INT8 format for the custom layer and the point cloud model has less large block computation.

we also profile the model in different precisions, read this for details.

iassd_hvcsx2_gq_4x8_80e_kitti_3cls

	Car	Pedestrian	Cyclist	Runtime
FP32				6 ms
FP16				5 ms
INT8

How to use

It receives msgs from sensor_msgs::PointCloud2 /points and publishes visualization_msgs::MarkerArray /objects.

./devel/lib/point_detection/point_detector

we offer another utils script to publish point clouds from .bin files.

python src/pcvt.py -s bin -d topic -t /points -p /home/nrsl/Downloads/velodyne_points/data

Plugins

Your can easily implement a plugin just use our AUTO-CODES-GENERATION header.

ONNX

We export the model by RobDet3D. Please refer its manual to export you own onnx model. Feel free to let me know if you have any questions.

Limitation

~~When build engine with INT8 mode, it throws cuda configuration error during calibration. Therefore, only FP32 and FP16 mode can be used.~~

TODO

consider use cuda graph to reduce the latency introduced by launching too much kernel.
use dynamic parallelism to avoid cpu-based loop in HAVSampling.

Citation

If you find this project useful in your research, please consider citing:

@article{ouyang2023hierarchical,
  title={Hierarchical Adaptive Voxel-guided Sampling for Real-time Applications in Large-scale Point Clouds},
  author={Ouyang, Junyuan and Liu, Xiao and Chen, Haoyao},
  journal={arXiv preprint arXiv:2305.14306},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction

News

Build

Test

iassd_hvcsx2_4x8_80e_kitti_3cls

iassd_hvcsx2_gq_4x8_80e_kitti_3cls

How to use

Plugins

ONNX

Limitation

TODO

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction

News

Build

Test

iassd_hvcsx2_4x8_80e_kitti_3cls

iassd_hvcsx2_gq_4x8_80e_kitti_3cls

How to use

Plugins

ONNX

Limitation

TODO

Citation