Skip to content

Commit

Permalink
benchmarking the new KdTree
Browse files Browse the repository at this point in the history
  • Loading branch information
koide3 committed May 2, 2024
1 parent 24083cc commit 4ff47c3
Show file tree
Hide file tree
Showing 8 changed files with 32 additions and 31 deletions.
9 changes: 5 additions & 4 deletions BENCHMARK.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ All benchmarks were conducted on the KITTI 00 sequence.

```bash
cd small_gicp/scripts
./run_downsampling_benchmark.sh
./run_downsampling_benchmark.sh /path/to/kitti/velodyne
python3 plot_downsampling.py
```

Expand All @@ -67,12 +67,13 @@ python3 plot_downsampling.py

```bash
cd small_gicp/scripts
./run_kdtree_benchmark.sh
./run_kdtree_benchmark.sh /path/to/kitti/velodyne
python3 plot_kdtree.py
```

- Multi-threaded implementation (TBB and OMP) can be up to **4x faster** than the single-threaded one (All the implementations are based on nanoflann).
- The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?).
- ~~The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?)~~.
- The new KdTree implementation shows a good scalability thanks to its well balanced task assignment.
- This benchmark only compares the construction time (query time is not included).

![kdtree_time](docs/assets/kdtree_time.png)
Expand All @@ -81,7 +82,7 @@ python3 plot_kdtree.py

```bash
cd small_gicp/scripts
./run_odometry_benchmark.sh
./run_odometry_benchmark.sh /path/to/kitti/velodyne
python3 plot_odometry.py
```

Expand Down
20 changes: 13 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

**small_gicp** is a header-only C++ library that offers efficient and parallelized algorithms for fine point cloud registration (ICP, Point-to-Plane ICP, GICP, VGICP, etc.). It is a refined and optimized version of its predecessor, [fast_gicp](https://github.com/SMRT-AIST/fast_gicp), re-written from scratch with the following features.

- **Highly Optimized** : The implementation of the core registration algorithm is further optimized from that in fast_gicp. It enables up to **2x speed gain** compared to fast_gicp.
- **All parallerized** : small_gicp offers parallelized implementations of several preprocessing algorithms to make the entire registration process parallelized (Downsampling, KdTree construction, Normal/covariance estimation). As a parallelism backend, either (or both) [OpenMP](https://www.openmp.org/) and [Intel TBB](https://github.com/oneapi-src/oneTBB) can be used.
- **Minimum dependency** : Only [Eigen](https://eigen.tuxfamily.org/) (and bundled [nanoflann](https://github.com/jlblancoc/nanoflann) and [Sophus](https://github.com/strasdat/Sophus)) are required at a minimum. Optionally, it provides the [PCL](https://pointclouds.org/) registration interface so that it can be used as a drop-in replacement in many systems.
- **Highly Optimized** : The implementation of the core registration algorithm is further optimized from that in fast_gicp. It enables up to **2x speed gain**.
- **All parallerized** : small_gicp offers parallel implementations of several preprocessing algorithms to make the entire registration process parallelized (Downsampling, KdTree construction, Normal/covariance estimation). As a parallelism backend, either (or both) [OpenMP](https://www.openmp.org/) and [Intel TBB](https://github.com/oneapi-src/oneTBB) can be used.
- **Minimum dependency** : Only [Eigen](https://eigen.tuxfamily.org/) (and bundled [nanoflann](https://github.com/jlblancoc/nanoflann) and [Sophus](https://github.com/strasdat/Sophus)) are required at a minimum. Optionally, it provides the [PCL](https://pointclouds.org/) registration interface so that it can be used as a drop-in replacement.
- **Customizable** : small_gicp allows feeding any custom point cloud class to the registration algorithm via traits. Furthermore, the template-based implementation enables customizing the registration process with your original correspondence estimator and registration factors.
- **Python bindings** : The isolation from PCL makes small_gicp's python bindings more portable and connectable to other libraries (e.g., Open3D) without problems.
- **Python bindings** : The isolation from PCL makes small_gicp's python bindings more portable and usable with other libraries (e.g., Open3D) without problems.

Note that GPU-based implementations are NOT included in this package.

Expand All @@ -22,7 +22,7 @@ This library uses some C++17 features. The PCL interface is not compatible with
## Dependencies

- [Mandatory] [Eigen](https://eigen.tuxfamily.org/), [nanoflann](https://github.com/jlblancoc/nanoflann) ([bundled](include/small_gicp/ann/kdtree.hpp)), [Sophus](https://github.com/strasdat/Sophus) ([bundled](include/small_gicp/util/lie.hpp))
- [Optional] [OpenMP](https://www.openmp.org/), [Intel TBB](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html), [PCL](https://pointclouds.org/), [Iridescence](https://github.com/koide3/iridescence)
- [Optional] [OpenMP](https://www.openmp.org/), [Intel TBB](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html), [PCL](https://pointclouds.org/)

## Installation

Expand Down Expand Up @@ -344,6 +344,11 @@ open3d.visualization.draw_geometries([target_o3d, source_o3d])

</details>


### Cookbook

- [Scan-to-scan and scan-to-model GICP matching odometry on KITTI](src/example/kitti_odometry.py)

## [Benchmark](BENCHMARK.md)

Processing speed comparison between small_gicp and Open3D ([youtube]((https://youtu.be/LNESzGXPr4c?feature=shared))).
Expand All @@ -360,8 +365,9 @@ Processing speed comparison between small_gicp and Open3D ([youtube]((https://yo

### KdTree construction

- Multi-threaded implementation (TBB and OMP) can be up to **4x faster** than the single-threaded one (All the implementations are based on nanoflann).
- The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?).
- Multi-threaded implementation (TBB and OMP) can be up to **6x faster** than the single-threaded one. The single-thread version shows almost equivalent performance with nanoflann.
- ~~The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?)~~.
- The new KdTree implementation shows a good scalability thanks to its well balanced task assignment.
- This benchmark only compares the construction time (query time is not included).

![kdtree_time](docs/assets/kdtree_time.png)
Expand Down
Binary file modified docs/assets/kdtree_time.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 3 additions & 5 deletions include/small_gicp/ann/kdtree.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,11 @@ struct KdTreeBuilder {
NodeIndexType create_node(KdTree& kdtree, size_t& node_count, const PointCloud& points, IndexConstIterator global_first, IndexConstIterator first, IndexConstIterator last)
const {
const size_t N = std::distance(first, last);
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// Create a leaf node.
if (N <= max_leaf_size) {
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// std::sort(first, last);
node.node_type.lr.first = std::distance(global_first, first);
node.node_type.lr.last = std::distance(global_first, last);
Expand All @@ -115,8 +115,6 @@ struct KdTreeBuilder {
std::nth_element(first, median_itr, last, [&](size_t i, size_t j) { return proj(traits::point(points, i)) < proj(traits::point(points, j)); });

// Create a non-leaf node.
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];
node.node_type.sub.proj = proj;
node.node_type.sub.thresh = proj(traits::point(points, *median_itr));

Expand Down
8 changes: 3 additions & 5 deletions include/small_gicp/ann/kdtree_omp.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,11 @@ struct KdTreeBuilderOMP {
IndexConstIterator first,
IndexConstIterator last) const {
const size_t N = std::distance(first, last);
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// Create a leaf node.
if (N <= max_leaf_size) {
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// std::sort(first, last);
node.node_type.lr.first = std::distance(global_first, first);
node.node_type.lr.last = std::distance(global_first, last);
Expand All @@ -74,8 +74,6 @@ struct KdTreeBuilderOMP {
std::nth_element(first, median_itr, last, [&](size_t i, size_t j) { return proj(traits::point(points, i)) < proj(traits::point(points, j)); });

// Create a non-leaf node.
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];
node.node_type.sub.proj = proj;
node.node_type.sub.thresh = proj(traits::point(points, *median_itr));

Expand Down
8 changes: 3 additions & 5 deletions include/small_gicp/ann/kdtree_tbb.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@ struct KdTreeBuilderTBB {
IndexConstIterator first,
IndexConstIterator last) const {
const size_t N = std::distance(first, last);
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// Create a leaf node.
if (N <= max_leaf_size) {
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// std::sort(first, last);
node.node_type.lr.first = std::distance(global_first, first);
node.node_type.lr.last = std::distance(global_first, last);
Expand All @@ -56,8 +56,6 @@ struct KdTreeBuilderTBB {
std::nth_element(first, median_itr, last, [&](size_t i, size_t j) { return proj(traits::point(points, i)) < proj(traits::point(points, j)); });

// Create a non-leaf node.
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];
node.node_type.sub.proj = proj;
node.node_type.sub.thresh = proj(traits::point(points, *median_itr));

Expand Down
8 changes: 4 additions & 4 deletions scripts/plot_kdtree.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,14 +54,14 @@ def main():
fig, axes = pyplot.subplots(1, 2, figsize=(12, 3))

num_threads = [1, 2, 3, 4, 5, 6, 7, 8, 16, 32, 64, 128]
axes[0].plot(num_points, results['small_1'], label='kdtree (nanoflann)', marker='o', linestyle='--')
for idx in [1, 3, 5, 7, 8]:
axes[0].plot(num_points, results['small_1'], label='kdtree (single-thread)', marker='o', linestyle='--')
for idx in [1, 2, 3, 5, 7, 8, 9]:
N = num_threads[idx]
axes[0].plot(num_points, results['omp_{}'.format(N)], label='kdtree_omp (%d threads)' % N, marker='s')
axes[0].plot(num_points, results['tbb_{}'.format(N)], label='kdtree_tbb (%d threads)' % N, marker='^')
# axes[0].plot(num_points, results['tbb_{}'.format(N)], label='kdtree_tbb (%d threads)' % N, marker='^')

baseline = numpy.array(results['small_1'])
axes[1].plot([num_threads[0], num_threads[-1]], [1.0, 1.0], label='kdtree (nanoflann)', linestyle='--')
axes[1].plot([num_threads[0], num_threads[-1]], [1.0, 1.0], label='kdtree (single-thread)', linestyle='--')
for idx in [5]:
threads = num_threads[idx]
N = num_points[idx]
Expand Down
2 changes: 1 addition & 1 deletion scripts/run_kdtree_benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ dataset_path=$1
exe_path=../build/kdtree_benchmark

mkdir results
num_threads=(1 2 3 4 5 6 7 8 16 32 64 128)
num_threads=(1 2 3 4 5 6 7 8 16 32 64 92 128)

$exe_path $dataset_path --num_threads 1 --num_trials 1000 --method small | tee results/kdtree_benchmark_small_$N.txt

Expand Down

0 comments on commit 4ff47c3

Please sign in to comment.