diff --git a/BENCHMARK.md b/BENCHMARK.md index 78e70d5..452c2be 100644 --- a/BENCHMARK.md +++ b/BENCHMARK.md @@ -48,7 +48,7 @@ All benchmarks were conducted on the KITTI 00 sequence. ```bash cd small_gicp/scripts -./run_downsampling_benchmark.sh +./run_downsampling_benchmark.sh /path/to/kitti/velodyne python3 plot_downsampling.py ``` @@ -67,12 +67,13 @@ python3 plot_downsampling.py ```bash cd small_gicp/scripts -./run_kdtree_benchmark.sh +./run_kdtree_benchmark.sh /path/to/kitti/velodyne python3 plot_kdtree.py ``` - Multi-threaded implementation (TBB and OMP) can be up to **4x faster** than the single-threaded one (All the implementations are based on nanoflann). -- The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?). +- ~~The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?)~~. +- The new KdTree implementation shows a good scalability thanks to its well balanced task assignment. - This benchmark only compares the construction time (query time is not included). ![kdtree_time](docs/assets/kdtree_time.png) @@ -81,7 +82,7 @@ python3 plot_kdtree.py ```bash cd small_gicp/scripts -./run_odometry_benchmark.sh +./run_odometry_benchmark.sh /path/to/kitti/velodyne python3 plot_odometry.py ``` diff --git a/README.md b/README.md index 5cd3eb0..4e2c058 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,11 @@ **small_gicp** is a header-only C++ library that offers efficient and parallelized algorithms for fine point cloud registration (ICP, Point-to-Plane ICP, GICP, VGICP, etc.). It is a refined and optimized version of its predecessor, [fast_gicp](https://github.com/SMRT-AIST/fast_gicp), re-written from scratch with the following features. -- **Highly Optimized** : The implementation of the core registration algorithm is further optimized from that in fast_gicp. It enables up to **2x speed gain** compared to fast_gicp. -- **All parallerized** : small_gicp offers parallelized implementations of several preprocessing algorithms to make the entire registration process parallelized (Downsampling, KdTree construction, Normal/covariance estimation). As a parallelism backend, either (or both) [OpenMP](https://www.openmp.org/) and [Intel TBB](https://github.com/oneapi-src/oneTBB) can be used. -- **Minimum dependency** : Only [Eigen](https://eigen.tuxfamily.org/) (and bundled [nanoflann](https://github.com/jlblancoc/nanoflann) and [Sophus](https://github.com/strasdat/Sophus)) are required at a minimum. Optionally, it provides the [PCL](https://pointclouds.org/) registration interface so that it can be used as a drop-in replacement in many systems. +- **Highly Optimized** : The implementation of the core registration algorithm is further optimized from that in fast_gicp. It enables up to **2x speed gain**. +- **All parallerized** : small_gicp offers parallel implementations of several preprocessing algorithms to make the entire registration process parallelized (Downsampling, KdTree construction, Normal/covariance estimation). As a parallelism backend, either (or both) [OpenMP](https://www.openmp.org/) and [Intel TBB](https://github.com/oneapi-src/oneTBB) can be used. +- **Minimum dependency** : Only [Eigen](https://eigen.tuxfamily.org/) (and bundled [nanoflann](https://github.com/jlblancoc/nanoflann) and [Sophus](https://github.com/strasdat/Sophus)) are required at a minimum. Optionally, it provides the [PCL](https://pointclouds.org/) registration interface so that it can be used as a drop-in replacement. - **Customizable** : small_gicp allows feeding any custom point cloud class to the registration algorithm via traits. Furthermore, the template-based implementation enables customizing the registration process with your original correspondence estimator and registration factors. -- **Python bindings** : The isolation from PCL makes small_gicp's python bindings more portable and connectable to other libraries (e.g., Open3D) without problems. +- **Python bindings** : The isolation from PCL makes small_gicp's python bindings more portable and usable with other libraries (e.g., Open3D) without problems. Note that GPU-based implementations are NOT included in this package. @@ -22,7 +22,7 @@ This library uses some C++17 features. The PCL interface is not compatible with ## Dependencies - [Mandatory] [Eigen](https://eigen.tuxfamily.org/), [nanoflann](https://github.com/jlblancoc/nanoflann) ([bundled](include/small_gicp/ann/kdtree.hpp)), [Sophus](https://github.com/strasdat/Sophus) ([bundled](include/small_gicp/util/lie.hpp)) -- [Optional] [OpenMP](https://www.openmp.org/), [Intel TBB](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html), [PCL](https://pointclouds.org/), [Iridescence](https://github.com/koide3/iridescence) +- [Optional] [OpenMP](https://www.openmp.org/), [Intel TBB](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html), [PCL](https://pointclouds.org/) ## Installation @@ -344,6 +344,11 @@ open3d.visualization.draw_geometries([target_o3d, source_o3d]) + +### Cookbook + +- [Scan-to-scan and scan-to-model GICP matching odometry on KITTI](src/example/kitti_odometry.py) + ## [Benchmark](BENCHMARK.md) Processing speed comparison between small_gicp and Open3D ([youtube]((https://youtu.be/LNESzGXPr4c?feature=shared))). @@ -360,8 +365,9 @@ Processing speed comparison between small_gicp and Open3D ([youtube]((https://yo ### KdTree construction -- Multi-threaded implementation (TBB and OMP) can be up to **4x faster** than the single-threaded one (All the implementations are based on nanoflann). -- The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?). +- Multi-threaded implementation (TBB and OMP) can be up to **6x faster** than the single-threaded one. The single-thread version shows almost equivalent performance with nanoflann. +- ~~The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?)~~. +- The new KdTree implementation shows a good scalability thanks to its well balanced task assignment. - This benchmark only compares the construction time (query time is not included). ![kdtree_time](docs/assets/kdtree_time.png) diff --git a/docs/assets/kdtree_time.png b/docs/assets/kdtree_time.png index aa16c94..2f3b532 100644 Binary files a/docs/assets/kdtree_time.png and b/docs/assets/kdtree_time.png differ diff --git a/include/small_gicp/ann/kdtree.hpp b/include/small_gicp/ann/kdtree.hpp index df0240f..2141f70 100644 --- a/include/small_gicp/ann/kdtree.hpp +++ b/include/small_gicp/ann/kdtree.hpp @@ -96,11 +96,11 @@ struct KdTreeBuilder { NodeIndexType create_node(KdTree& kdtree, size_t& node_count, const PointCloud& points, IndexConstIterator global_first, IndexConstIterator first, IndexConstIterator last) const { const size_t N = std::distance(first, last); + const NodeIndexType node_index = node_count++; + auto& node = kdtree.nodes[node_index]; + // Create a leaf node. if (N <= max_leaf_size) { - const NodeIndexType node_index = node_count++; - auto& node = kdtree.nodes[node_index]; - // std::sort(first, last); node.node_type.lr.first = std::distance(global_first, first); node.node_type.lr.last = std::distance(global_first, last); @@ -115,8 +115,6 @@ struct KdTreeBuilder { std::nth_element(first, median_itr, last, [&](size_t i, size_t j) { return proj(traits::point(points, i)) < proj(traits::point(points, j)); }); // Create a non-leaf node. - const NodeIndexType node_index = node_count++; - auto& node = kdtree.nodes[node_index]; node.node_type.sub.proj = proj; node.node_type.sub.thresh = proj(traits::point(points, *median_itr)); diff --git a/include/small_gicp/ann/kdtree_omp.hpp b/include/small_gicp/ann/kdtree_omp.hpp index e1e8d96..2968a5e 100644 --- a/include/small_gicp/ann/kdtree_omp.hpp +++ b/include/small_gicp/ann/kdtree_omp.hpp @@ -55,11 +55,11 @@ struct KdTreeBuilderOMP { IndexConstIterator first, IndexConstIterator last) const { const size_t N = std::distance(first, last); + const NodeIndexType node_index = node_count++; + auto& node = kdtree.nodes[node_index]; + // Create a leaf node. if (N <= max_leaf_size) { - const NodeIndexType node_index = node_count++; - auto& node = kdtree.nodes[node_index]; - // std::sort(first, last); node.node_type.lr.first = std::distance(global_first, first); node.node_type.lr.last = std::distance(global_first, last); @@ -74,8 +74,6 @@ struct KdTreeBuilderOMP { std::nth_element(first, median_itr, last, [&](size_t i, size_t j) { return proj(traits::point(points, i)) < proj(traits::point(points, j)); }); // Create a non-leaf node. - const NodeIndexType node_index = node_count++; - auto& node = kdtree.nodes[node_index]; node.node_type.sub.proj = proj; node.node_type.sub.thresh = proj(traits::point(points, *median_itr)); diff --git a/include/small_gicp/ann/kdtree_tbb.hpp b/include/small_gicp/ann/kdtree_tbb.hpp index d6a71a7..890fbb8 100644 --- a/include/small_gicp/ann/kdtree_tbb.hpp +++ b/include/small_gicp/ann/kdtree_tbb.hpp @@ -37,11 +37,11 @@ struct KdTreeBuilderTBB { IndexConstIterator first, IndexConstIterator last) const { const size_t N = std::distance(first, last); + const NodeIndexType node_index = node_count++; + auto& node = kdtree.nodes[node_index]; + // Create a leaf node. if (N <= max_leaf_size) { - const NodeIndexType node_index = node_count++; - auto& node = kdtree.nodes[node_index]; - // std::sort(first, last); node.node_type.lr.first = std::distance(global_first, first); node.node_type.lr.last = std::distance(global_first, last); @@ -56,8 +56,6 @@ struct KdTreeBuilderTBB { std::nth_element(first, median_itr, last, [&](size_t i, size_t j) { return proj(traits::point(points, i)) < proj(traits::point(points, j)); }); // Create a non-leaf node. - const NodeIndexType node_index = node_count++; - auto& node = kdtree.nodes[node_index]; node.node_type.sub.proj = proj; node.node_type.sub.thresh = proj(traits::point(points, *median_itr)); diff --git a/scripts/plot_kdtree.py b/scripts/plot_kdtree.py index a30d9c1..7ead743 100644 --- a/scripts/plot_kdtree.py +++ b/scripts/plot_kdtree.py @@ -54,14 +54,14 @@ def main(): fig, axes = pyplot.subplots(1, 2, figsize=(12, 3)) num_threads = [1, 2, 3, 4, 5, 6, 7, 8, 16, 32, 64, 128] - axes[0].plot(num_points, results['small_1'], label='kdtree (nanoflann)', marker='o', linestyle='--') - for idx in [1, 3, 5, 7, 8]: + axes[0].plot(num_points, results['small_1'], label='kdtree (single-thread)', marker='o', linestyle='--') + for idx in [1, 2, 3, 5, 7, 8, 9]: N = num_threads[idx] axes[0].plot(num_points, results['omp_{}'.format(N)], label='kdtree_omp (%d threads)' % N, marker='s') - axes[0].plot(num_points, results['tbb_{}'.format(N)], label='kdtree_tbb (%d threads)' % N, marker='^') + # axes[0].plot(num_points, results['tbb_{}'.format(N)], label='kdtree_tbb (%d threads)' % N, marker='^') baseline = numpy.array(results['small_1']) - axes[1].plot([num_threads[0], num_threads[-1]], [1.0, 1.0], label='kdtree (nanoflann)', linestyle='--') + axes[1].plot([num_threads[0], num_threads[-1]], [1.0, 1.0], label='kdtree (single-thread)', linestyle='--') for idx in [5]: threads = num_threads[idx] N = num_points[idx] diff --git a/scripts/run_kdtree_benchmark.sh b/scripts/run_kdtree_benchmark.sh index 79fc832..cf88254 100755 --- a/scripts/run_kdtree_benchmark.sh +++ b/scripts/run_kdtree_benchmark.sh @@ -3,7 +3,7 @@ dataset_path=$1 exe_path=../build/kdtree_benchmark mkdir results -num_threads=(1 2 3 4 5 6 7 8 16 32 64 128) +num_threads=(1 2 3 4 5 6 7 8 16 32 64 92 128) $exe_path $dataset_path --num_threads 1 --num_trials 1000 --method small | tee results/kdtree_benchmark_small_$N.txt