A CUDA implementation of keypoint detection and descriptor extraction
This project implements keypoint detection and descriptor extraction in CUDA,
motivated by iago-suarez/efficient-descriptors and provides the following features.
- Provides CUDA implementation of multi-scale FAST corner detection based on OpenCV's
cv::ORB::detect
- Added functionality for controlling distribution of keypoints
The figure below shows a comparison of keypoint detection results between OpenCV's implementation and this project's one.
Left: Even though OpenCV's implementation detected about 20,000 keypoints, most of them were concentrated in the leaves of trees.
Right: With this project's nonmaxRadius
setting of 15, the keypoints are detected evenly throughout the image.
- Provides CUDA implementation of BAD and HashSIFT descriptors proposed by Suarez et al. [1][2]
- Provides APIs in the same format as OpenCV's
cv::Feature2D
detect,compute,detectAndCompute
- Also provides asynchronous APIs for concurrent execution
detectAsync,computeAsync,detectAndComputeAsync
- [1] Suarez, Iago, Jose M. Buenaposada, and Luis Baumela. "Revisiting binary local image description for resource limited devices." IEEE Robotics and Automation Letters 6.4 (2021): 8317-8324.
- [2] https://github.com/iago-suarez/efficient-descriptors
Using sample_benchmark
, We measured the processing time for each API below.
detect
: keypoint detection onlycompute
: descriptor extraction onlydetectAndCompute
: keypoint detection and descriptor extraction
Each processing time below is an average of 11 images obtained from SceauxCastle. The unit of time is milliseconds.
With the default parameters, we measured the processing time of keypoint detection while changing the image size to FHD(1920x1080), 4K(3840x2160), and 8K(7680x4320).
Device | FHD | 4K | 8K |
---|---|---|---|
RTX 3060 Ti | 1.6 | 2.9 | 5.5 |
Jetson Xavier | 5.6 | 12.1 | 27.5 |
For each descriptor, we measured the processing time of descriptor extraction for 40,000 keypoints.
Device | BAD256 | BAD512 | HashSIFT256 | HashSIFT512 |
---|---|---|---|---|
RTX 3060 Ti | 1.5 | 2.7 | 3.5 | 3.9 |
Jetson Xavier | 19.1 | 28.2 | 21.9 | 24.8 |
For each descriptor, we measured the processing time when executing both keypoint detection and descriptor extraction for 40,000 keypoints.
Device | BAD256 | BAD512 | HashSIFT256 | HashSIFT512 |
---|---|---|---|---|
RTX 3060 Ti | 7.2 | 8.2 | 8.5 | 8.9 |
Jetson Xavier | 41.7 | 48.8 | 46.2 | 49.2 |
Package Name | Minimum Requirements | Note |
---|---|---|
CMake | version >= 3.18 | |
CUDA Toolkit | compute capability >= 6.0 | |
OpenCV | version >= 4.6.0 | |
OpenCV CUDA module | version >= 4.6.0 | included in opencv/opencv_contrib |
$ git clone https://github.com/fixstars/cuda-efficient-features.git
$ cd cuda-efficient-features
$ git submodule update --init # needed if BUILD_TESTS is ON
$ mkdir build
$ cd build
$ cmake ../ # Several options available (e.g. -DBUILD_TESTS=ON -DCUDA_ARCHS=86)
$ make
Option | Description | Default |
---|---|---|
BUILD_SAMPLES | Build samples | ON |
BUILD_TESTS | Build tests | OFF |
CUDA_ARCHS | List of architectures to generate device code for | 52;61;72;75;86 |
Command | Description |
---|---|
./samples/sample_feature_extraction input-image [options] |
Feature detection and description |
./samples/sample_feature_matching first-image second-image [options] |
Feature matching on an image pair |
./samples/sample_image_sequence image-format [options] |
Feature matching on an image sequence |
./samples/sample_benchmark input-image [options] |
Performance benchmarking |
./samples/hpatches_description hpatchs-dir [options] |
Feature description on HPatches dataset for hpatches-benchmark |
Use the --help
or -h
option for detailed information.
./samples/sample_feature_extraction -h
Run the following command.
./tests/tests
The "adaskit Team"
The adaskit is an open-source project created by Fixstars Corporation and its subsidiary companies including Fixstars Autonomous Technologies, aimed at contributing to the ADAS industry by developing high-performance implementations for algorithms with high computational cost.
Apache License 2.0