This is a ROS implementation of a fairly common image processing pipeline with using CUDA and NPP:
- Debayer
- Undistort(Rectify)
- Resize
- NPP library for Debayer & Resize classes
- Code structure and function of Undistort class are highly related to the dusty-nv/jetson-utils
- Go to nearest ROS workspace and clone this repo:
cd catkin_ws/src
git clone https://github.com/yucedagonurcan/ImageCudaPrepRosNode.git
- Build the node:
catkin build cuda_img_processing --cmake-args -DCMAKE_BUILD_TYPE=Release
- I don't think this pipeline alone can make a drastic difference in terms of throughput and data shows that too:
- Delay per image (CPU) = 0.055
- Delay per image (GPU) : 0.050
- The most important metric in this project can be freeing the CPU from this pipeline:
- CPU usage ( CPU ): 26.5%
- CPU usage ( GPU ): 11.5%
- I am pretty sure that we can further optimize this code too.