Skip to content

benchmarking the new KdTree #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions BENCHMARK.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ All benchmarks were conducted on the KITTI 00 sequence.

```bash
cd small_gicp/scripts
./run_downsampling_benchmark.sh
./run_downsampling_benchmark.sh /path/to/kitti/velodyne
python3 plot_downsampling.py
```

Expand All @@ -67,12 +67,13 @@ python3 plot_downsampling.py

```bash
cd small_gicp/scripts
./run_kdtree_benchmark.sh
./run_kdtree_benchmark.sh /path/to/kitti/velodyne
python3 plot_kdtree.py
```

- Multi-threaded implementation (TBB and OMP) can be up to **4x faster** than the single-threaded one (All the implementations are based on nanoflann).
- The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?).
- ~~The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?)~~.
- The new KdTree implementation shows a good scalability thanks to its well balanced task assignment.
- This benchmark only compares the construction time (query time is not included).

![kdtree_time](docs/assets/kdtree_time.png)
Expand All @@ -81,7 +82,7 @@ python3 plot_kdtree.py

```bash
cd small_gicp/scripts
./run_odometry_benchmark.sh
./run_odometry_benchmark.sh /path/to/kitti/velodyne
python3 plot_odometry.py
```

Expand Down
20 changes: 13 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

**small_gicp** is a header-only C++ library that offers efficient and parallelized algorithms for fine point cloud registration (ICP, Point-to-Plane ICP, GICP, VGICP, etc.). It is a refined and optimized version of its predecessor, [fast_gicp](https://github.com/SMRT-AIST/fast_gicp), re-written from scratch with the following features.

- **Highly Optimized** : The implementation of the core registration algorithm is further optimized from that in fast_gicp. It enables up to **2x speed gain** compared to fast_gicp.
- **All parallerized** : small_gicp offers parallelized implementations of several preprocessing algorithms to make the entire registration process parallelized (Downsampling, KdTree construction, Normal/covariance estimation). As a parallelism backend, either (or both) [OpenMP](https://www.openmp.org/) and [Intel TBB](https://github.com/oneapi-src/oneTBB) can be used.
- **Minimum dependency** : Only [Eigen](https://eigen.tuxfamily.org/) (and bundled [nanoflann](https://github.com/jlblancoc/nanoflann) and [Sophus](https://github.com/strasdat/Sophus)) are required at a minimum. Optionally, it provides the [PCL](https://pointclouds.org/) registration interface so that it can be used as a drop-in replacement in many systems.
- **Highly Optimized** : The implementation of the core registration algorithm is further optimized from that in fast_gicp. It enables up to **2x speed gain**.
- **All parallerized** : small_gicp offers parallel implementations of several preprocessing algorithms to make the entire registration process parallelized (Downsampling, KdTree construction, Normal/covariance estimation). As a parallelism backend, either (or both) [OpenMP](https://www.openmp.org/) and [Intel TBB](https://github.com/oneapi-src/oneTBB) can be used.
- **Minimum dependency** : Only [Eigen](https://eigen.tuxfamily.org/) (and bundled [nanoflann](https://github.com/jlblancoc/nanoflann) and [Sophus](https://github.com/strasdat/Sophus)) are required at a minimum. Optionally, it provides the [PCL](https://pointclouds.org/) registration interface so that it can be used as a drop-in replacement.
- **Customizable** : small_gicp allows feeding any custom point cloud class to the registration algorithm via traits. Furthermore, the template-based implementation enables customizing the registration process with your original correspondence estimator and registration factors.
- **Python bindings** : The isolation from PCL makes small_gicp's python bindings more portable and connectable to other libraries (e.g., Open3D) without problems.
- **Python bindings** : The isolation from PCL makes small_gicp's python bindings more portable and usable with other libraries (e.g., Open3D) without problems.

Note that GPU-based implementations are NOT included in this package.

Expand All @@ -22,7 +22,7 @@ This library uses some C++17 features. The PCL interface is not compatible with
## Dependencies

- [Mandatory] [Eigen](https://eigen.tuxfamily.org/), [nanoflann](https://github.com/jlblancoc/nanoflann) ([bundled](include/small_gicp/ann/kdtree.hpp)), [Sophus](https://github.com/strasdat/Sophus) ([bundled](include/small_gicp/util/lie.hpp))
- [Optional] [OpenMP](https://www.openmp.org/), [Intel TBB](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html), [PCL](https://pointclouds.org/), [Iridescence](https://github.com/koide3/iridescence)
- [Optional] [OpenMP](https://www.openmp.org/), [Intel TBB](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html), [PCL](https://pointclouds.org/)

## Installation

Expand Down Expand Up @@ -344,6 +344,11 @@ open3d.visualization.draw_geometries([target_o3d, source_o3d])

</details>


### Cookbook

- [Scan-to-scan and scan-to-model GICP matching odometry on KITTI](src/example/kitti_odometry.py)

## [Benchmark](BENCHMARK.md)

Processing speed comparison between small_gicp and Open3D ([youtube]((https://youtu.be/LNESzGXPr4c?feature=shared))).
Expand All @@ -360,8 +365,9 @@ Processing speed comparison between small_gicp and Open3D ([youtube]((https://yo

### KdTree construction

- Multi-threaded implementation (TBB and OMP) can be up to **4x faster** than the single-threaded one (All the implementations are based on nanoflann).
- The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?).
- Multi-threaded implementation (TBB and OMP) can be up to **6x faster** than the single-threaded one. The single-thread version shows almost equivalent performance with nanoflann.
- ~~The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?)~~.
- The new KdTree implementation shows a good scalability thanks to its well balanced task assignment.
- This benchmark only compares the construction time (query time is not included).

![kdtree_time](docs/assets/kdtree_time.png)
Expand Down
Binary file modified docs/assets/kdtree_time.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 3 additions & 5 deletions include/small_gicp/ann/kdtree.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,11 @@ struct KdTreeBuilder {
NodeIndexType create_node(KdTree& kdtree, size_t& node_count, const PointCloud& points, IndexConstIterator global_first, IndexConstIterator first, IndexConstIterator last)
const {
const size_t N = std::distance(first, last);
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// Create a leaf node.
if (N <= max_leaf_size) {
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// std::sort(first, last);
node.node_type.lr.first = std::distance(global_first, first);
node.node_type.lr.last = std::distance(global_first, last);
Expand All @@ -115,8 +115,6 @@ struct KdTreeBuilder {
std::nth_element(first, median_itr, last, [&](size_t i, size_t j) { return proj(traits::point(points, i)) < proj(traits::point(points, j)); });

// Create a non-leaf node.
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];
node.node_type.sub.proj = proj;
node.node_type.sub.thresh = proj(traits::point(points, *median_itr));

Expand Down
8 changes: 3 additions & 5 deletions include/small_gicp/ann/kdtree_omp.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,11 @@ struct KdTreeBuilderOMP {
IndexConstIterator first,
IndexConstIterator last) const {
const size_t N = std::distance(first, last);
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// Create a leaf node.
if (N <= max_leaf_size) {
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// std::sort(first, last);
node.node_type.lr.first = std::distance(global_first, first);
node.node_type.lr.last = std::distance(global_first, last);
Expand All @@ -74,8 +74,6 @@ struct KdTreeBuilderOMP {
std::nth_element(first, median_itr, last, [&](size_t i, size_t j) { return proj(traits::point(points, i)) < proj(traits::point(points, j)); });

// Create a non-leaf node.
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];
node.node_type.sub.proj = proj;
node.node_type.sub.thresh = proj(traits::point(points, *median_itr));

Expand Down
8 changes: 3 additions & 5 deletions include/small_gicp/ann/kdtree_tbb.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@ struct KdTreeBuilderTBB {
IndexConstIterator first,
IndexConstIterator last) const {
const size_t N = std::distance(first, last);
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// Create a leaf node.
if (N <= max_leaf_size) {
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];

// std::sort(first, last);
node.node_type.lr.first = std::distance(global_first, first);
node.node_type.lr.last = std::distance(global_first, last);
Expand All @@ -56,8 +56,6 @@ struct KdTreeBuilderTBB {
std::nth_element(first, median_itr, last, [&](size_t i, size_t j) { return proj(traits::point(points, i)) < proj(traits::point(points, j)); });

// Create a non-leaf node.
const NodeIndexType node_index = node_count++;
auto& node = kdtree.nodes[node_index];
node.node_type.sub.proj = proj;
node.node_type.sub.thresh = proj(traits::point(points, *median_itr));

Expand Down
8 changes: 4 additions & 4 deletions scripts/plot_kdtree.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,14 +54,14 @@ def main():
fig, axes = pyplot.subplots(1, 2, figsize=(12, 3))

num_threads = [1, 2, 3, 4, 5, 6, 7, 8, 16, 32, 64, 128]
axes[0].plot(num_points, results['small_1'], label='kdtree (nanoflann)', marker='o', linestyle='--')
for idx in [1, 3, 5, 7, 8]:
axes[0].plot(num_points, results['small_1'], label='kdtree (single-thread)', marker='o', linestyle='--')
for idx in [1, 2, 3, 5, 7, 8, 9]:
N = num_threads[idx]
axes[0].plot(num_points, results['omp_{}'.format(N)], label='kdtree_omp (%d threads)' % N, marker='s')
axes[0].plot(num_points, results['tbb_{}'.format(N)], label='kdtree_tbb (%d threads)' % N, marker='^')
# axes[0].plot(num_points, results['tbb_{}'.format(N)], label='kdtree_tbb (%d threads)' % N, marker='^')

baseline = numpy.array(results['small_1'])
axes[1].plot([num_threads[0], num_threads[-1]], [1.0, 1.0], label='kdtree (nanoflann)', linestyle='--')
axes[1].plot([num_threads[0], num_threads[-1]], [1.0, 1.0], label='kdtree (single-thread)', linestyle='--')
for idx in [5]:
threads = num_threads[idx]
N = num_points[idx]
Expand Down
2 changes: 1 addition & 1 deletion scripts/run_kdtree_benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ dataset_path=$1
exe_path=../build/kdtree_benchmark

mkdir results
num_threads=(1 2 3 4 5 6 7 8 16 32 64 128)
num_threads=(1 2 3 4 5 6 7 8 16 32 64 92 128)

$exe_path $dataset_path --num_threads 1 --num_trials 1000 --method small | tee results/kdtree_benchmark_small_$N.txt

Expand Down