[feat CudaRansac] Parallelize final mask computation. Use shared variables instead of global variables #30

true-real-michael · 2024-04-05T17:45:18Z

This PR speeds up RANSAC by 1) parallelising result mask computation, 2) using local variables that are shared between threads instead of global variables.

Currently the result of the kernel (the result mask) is written by only one thread. The mask may be large, so it can be slow. In this PR the thread with the best plane writes the plane to the shared memory and all of the threads compute the mask based on the distance from points to the plane.
It is inefficient to use global CUDA memory, as it is slower than shared memory (it is shared between threads of the same block). The variables mask_mutex and max_inliers_number_cuda are no longer passed to CUDA global memory, and are replaced with variables in the shared memory.

Processing time comparison:

Clouds  Current (s)    New (s)
10       1.205917     1.126165
20       2.098319     1.914480
30       2.969012     2.602244
50       4.742269     3.968544
80       7.056089     5.925835
100      9.202390     7.302907

parallelize final mask computation among threads of a given block

af36b99

true-real-michael changed the title ~~[feat CudaRansac] Parallelize final mask computation among the threads of a block~~ [feat CudaRansac] Parallelize final mask computation. Use shared variables instead of global variables Apr 5, 2024

true-real-michael added 2 commits April 5, 2024 21:17

use shared variables for max_inliers_number and mutex

0f094cb

fix comment

99ddc5a

true-real-michael self-assigned this Apr 5, 2024

true-real-michael requested a review from pmokeev April 5, 2024 18:34

pmokeev approved these changes Apr 6, 2024

View reviewed changes

true-real-michael merged commit 37d554a into main Apr 6, 2024
4 checks passed

true-real-michael deleted the cuda-ransac/parallel-distance-calculation branch April 6, 2024 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat CudaRansac] Parallelize final mask computation. Use shared variables instead of global variables #30

[feat CudaRansac] Parallelize final mask computation. Use shared variables instead of global variables #30

true-real-michael commented Apr 5, 2024 •

edited

Loading

[feat CudaRansac] Parallelize final mask computation. Use shared variables instead of global variables #30

[feat CudaRansac] Parallelize final mask computation. Use shared variables instead of global variables #30

Conversation

true-real-michael commented Apr 5, 2024 • edited Loading

true-real-michael commented Apr 5, 2024 •

edited

Loading