TIM-VX is a software integration module provided by VeriSilicon to facilitate deployment of Neural-Networks on OpenVX enabled ML accelerators.
Tengine Lite has supported to integrate with TIM-VX Library of Verisilicon to inference CNN by Khadas VIM3(Amlogic A311D).
For some special reasons, only supported on Khadas VIM3 or x86_64 simulator to work the following steps, currently.
$ git clone https://github.com/VeriSilicon/TIM-VX.git
$ git clone https://github.com/OAID/Tengine.git tengine-lite
$ cd tengine-lite
non-cross-compilation
$ cd <tengine-lite-root-dir>
$ mkdir -p ./3rdparty/tim-vx/lib/x86_64
$ mkdir -p ./3rdparty/tim-vx/include
$ cp -rf ../TIM-VX/include/* ./3rdparty/tim-vx/include/
$ cp -rf ../TIM-VX/src ./source/device/tim-vx/
$ cp -rf ../TIM-VX/prebuilt-sdk/x86_64_linux/include/* ./3rdparty/tim-vx/include/
$ cp -rf ../TIM-VX/prebuilt-sdk/x86_64_linux/lib/* ./3rdparty/tim-vx/lib/x86_64/
$ rm ./source/device/tim-vx/src/tim/vx/*_test.cc
Build Tengine
$ export LD_LIBRARY_PATH=<tengine-lite-root-dir>/3rdparty/tim-vx/lib/x86_64
$ cd <tengine-lite-root-dir>
$ mkdir build && cd build
$ cmake -DTENGINE_ENABLE_TIM_VX=ON ..
$ make -j4
Prepare for VIM3 prebuild sdk:
$ wget -c https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.28/aarch64_A311D_D312513_A294074_R311680_T312233_O312045.tgz
$ tar zxvf aarch64_A311D_D312513_A294074_R311680_T312233_O312045.tgz
$ mv aarch64_A311D_D312513_A294074_R311680_T312233_O312045 prebuild-sdk-a311d
$ cd <tengine-lite-root-dir>
$ mkdir -p ./3rdparty/tim-vx/lib/aarch64
$ mkdir -p ./3rdparty/tim-vx/include
$ cp -rf ../TIM-VX/include/* ./3rdparty/tim-vx/include/
$ cp -rf ../TIM-VX/src ./source/device/tim-vx/
$ cp -rf ../prebuild-sdk-a311d/include/* ./3rdparty/tim-vx/include/
$ cp -rf ../prebuild-sdk-a311d/lib/* ./3rdparty/tim-vx/lib/aarch64/
$ rm ./source/device/tim-vx/src/tim/vx/*_test.cc
2.2.1 cross-compilation
TOOLCHAIN_FILE in the /toolchains
$ export LD_LIBRARY_PATH=<tengine-lite-root-dir>/3rdparty/tim-vx/lib/aarch64
$ cd <tengine-lite-root-dir>
$ mkdir build && cd build
$ cmake -DCMAKE_TOOLCHAIN_FILE=../toolchains/aarch64-linux-gnu.toolchain.cmake -DTENGINE_ENABLE_TIM_VX=ON ..
$ make -j4
2.2.2 non-cross-compilation
Check for galcore:
$ sudo dmesg | grep Galcore
if ( Galcore version < 6.4.3.p0.286725 )
$ rmmod galcore
$ insmod galcore.ko
Check for libOpenVX.so*:
$ sudo find / -name "libOpenVX.so*"
if ( libOpenVX.so version < libOpenVX.so.1.3.0 in /usr/lib )
$ cd <tengine-lite-root-dir>
$ mkdir -p Backup
$ mv /usr/lib/libOpenVX.so* ./Backup
$ cp -rf ../prebuild-sdk-a311d/lib/libOpenVX.so* /usr/lib
Build Tengine
$ cd <tengine-lite-root-dir>
$ mkdir build && cd build
$ cmake -DTENGINE_ENABLE_TIM_VX=ON ..
$ make -j4
non-cross-compilation
$ cd <tengine-lite-root-dir>
$ mkdir build && cd build
$ cmake -DTENGINE_ENABLE_TIM_VX=ON ..
$ make -j4
build-tim-vx-arm64/install/lib/
└── libtengine-lite.so
On the Khadas VIM3, it need to replace those libraries in the /lib/
TIM-VX Library needs the uint8 network model
/* set runtime options */
struct options opt;
opt.num_thread = num_thread;
opt.cluster = TENGINE_CLUSTER_ALL;
opt.precision = TENGINE_MODE_UINT8;
opt.affinity = 0;
[khadas@Khadas tengine-lite]# ./example/tm_classification_timvx -m squeezenet_uint8.tmfile -i cat.jpg -r 1 -s 0.017,0.017,0.017 -r 10
Tengine plugin allocator TIMVX is registered.
Image height not specified, use default 227
Image width not specified, use default 227
Mean value not specified, use default 104.0, 116.7, 122.7
tengine-lite library version: 1.2-dev
TIM-VX prerun.
model file : squeezenet_uint8.tmfile
image file : cat.jpg
img_h, img_w, scale[3], mean[3] : 227 227 , 0.017 0.017 0.017, 104.0 116.7 122.7
Repeat 10 times, thread 1, avg time 2.95 ms, max_time 3.42 ms, min_time 2.76 ms
--------------------------------------
34.786182, 278
33.942883, 287
33.732056, 280
32.045452, 277
30.780502, 282
Vendor | Devices |
---|---|
Amlogic | A311D, S905D3 |
NXP | i.MX 8M Plus |
JLQ | JA310 |
X86-64 | Simulator |
The TIM-VX NPU backend needs the uint8 tmfile as it's input model file, you can quantize the tmfile from float32 to uint8 from here.