Based on
- Can be used alone to speed up image processing operations;
- Combined with the use of TensorRT, the inferencing speed is further accelerated.
- Here we compare the tensorrt inference speed before and after
preprocessing withcuda
- Not using cuda code of image preprocessing, refer to my another tensorrt project
C++ image preproce | CUDA image preprocess |
25 ms | 19 ms |
Int8 quantization:
C++ image preproce | CUDA image preprocess |
10 ms | 3 ms |
project dir
├── bgr2rgb # cuda code achieve BGR to RGB
| ├── Makefile
| └── bgr2rgb.cu
├── bilinear # cuda code achieve bilinear resize
| ├── Makefile
| └── resize.cu
├── hwc2chw # cuda code achieve shape from HWC to CHW, such as np.transpose((2, 0, 1))
| ├── Makefile
| └── transpose.cu
├── normalize # cuda code achieve image data normalization
| ├── Makefile
| └── normal.cu
├── preprocess # unite the above(not simple stitching), achieve common image preprocessing
| ├── Makefile
| └── preprocess.cu
├── union_tensorrt # An example for uniting TensorRT, speed up Deeplabv3+ inferencing
| ├── Makefile
| ├── preprocess.cu
| ├── preprocess.h
| └── trt_infer.cpp
└── lena.jpg # Pictures for testing
- For directories: bgr2rgb、bilinear、hwc2chw、normalize
cd <dir name>
./<bin file> <image path>
# For example:
cd bgr2rgb
./bgr2rgb ../lena.jpg
# Then you can see the result of the image lena.jpg after the exchange of R channel and B channel, and save it in the current directory
Note: If the cuda or opencv installation directory is different from the one in the Makefile, remember to switch to your own
- Before model inference,images usually need to be Resize, BGR to RGB, HWC to CHW, and Normalize
- You can implement this process using the following steps:
cd preprocess
./preprocess ../lena.jpg
1)According to my another tensorrt project, building environment, download datasets, and training Deeplabv3+ network
2)Enter into directory: Deeplabv3+/TensorRT/C++/api_model/
3)Place the files which in this project union_tensorrt
directory into the above directory (or replace the original file)
4)Execute the following commands in sequence to use TensorRT inference
python pth2wts.py
5)The following results indicate that the operation is successful, and the segmentation result image will be generated in the same directory
Loading weights: ./para.wts
Succeeded building backbone!
Succeeded building aspp!
Succeeded building decoder!
Succeeded building total network!
Succeeded building serialized engine!
Succeeded building engine!
Succeeded saving .plan file!
Total image num is: 8 inference total cost is: 105ms average cost is: 19ms