How to make ai-deck run int8-bit model inference but output dequantization by default. #1592

SUJUSTDO · 2024-10-14T06:06:31Z

SUJUSTDO
Oct 14, 2024

Hi, I am trying to deploy ai models on AI-deck to perform tasks. In order to improve the inference speed, I have applied 8-bit post-quantisation to the model, but I want to get the result after dequantization by default, is it possible? How should i implement it?

gemenerik · 2024-10-14T09:18:52Z

gemenerik
Oct 14, 2024
Maintainer

If I'm not mistaken, most implementations automatically add both an input quantize and an output dequantize automatically. For clarity, where and how do you apply the quantization? In your machine learning library (TensorFlow, PyTorch), or in nntool?

2 replies

SUJUSTDO Oct 15, 2024
Author

Thanks for your reply. I am using gapflow (nntool+autotiler) to deploy the model. Unfortunately, I have not found an interface that automatically outputs dequantization. In addition, after using 8-bit quantization, the model experiences data overflow during inference (as shown below). How can I solve these two problems?

Node: S51_Conv2d_1x16x1x1, Argument: S51_Infos, Dim: 1, [1][1][1][1][13] ItemSize: 1
D0:0..13
 0 0 0 0 0 0 0 0 0 0 0 0 0
Node: S51_Conv2d_1x16x1x1, Argument: S51_Output, Dim: 2, [1][1][1][1][400] ItemSize: 1
D0: 0 - D1:0..400
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128
Node: S53_Op__model_11_Slice_split_qin0, Argument: S51_Output, Dim: 2, [1][1][1][1][400] ItemSize: 2
D0: 0 - D1:0..400
 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640
 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640
 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640
 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640
 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640
 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640
 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 -32640 30 33 14 16 18 9 20 12 9 18
 19396 13331 22 8 2 1 1 1 1 0 2 1 0 0 1 1 503 21391 10104 570 46 12 15 17 8 32 28 6 5 10
 15 5 1732 4710 16439 3237 927 562 818 496 496 386 722 722 438 438 341 301 4724 8825 8825 2231 1194 639 821 639 724 724 821 564
 821 439 388 388 10159 14782 2911 1213 573 347 394 347 239 307 394 211 307 186 186 211 7550 15983 4041 1312 547 332 547 332 258 332
 426 201 293 258 157 201 3189 9824 8670 2192 1329 554 1329 554 711 554 1035 489 1173 431 297 431 4523 7457 8450 2136 1468 612 1009 693
 1143 786 1143 612 1296 477 421 540 14007 14007 2434 790 200 121 121 156 94 156 176 121 137 57 83 107 21050 11268 265 67 15 9
 15 21 6 6 15 6 8 4 4 8 18399 14329 24 5 2 1 1 1 1 0 2 1 0 0 1 1 418 20136 10778 689
 154 30 64 93 50 82 57 34 23 73

gemenerik Oct 16, 2024
Maintainer

In that case you can manually dequantize the output in your GAP8 application. If you know the quantization scheme this should not be too hard. You should question if you really need to use e.g. floats in your GAP8 application though.

If you do not see any clipping-like issues in your network before quantization you should verify that the quantization is done correctly, and that you feed expected inputs during deployment

SUJUSTDO · 2024-10-17T12:36:19Z

SUJUSTDO
Oct 17, 2024
Author

Your answer is very helpful. I would also like to know if GAP8 supports mixed precision quantization.

1 reply

gemenerik Oct 18, 2024
Maintainer

Oof, I'm not sure. You could try asking GreenWaves about this. Probably this is not limited by the actual chip, but not sure if the gap sdk toolchain can handle it. I do see some reference to it however, so I would say give it a shot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bitcraze

How to make ai-deck run int8-bit model inference but output dequantization by default. #1592

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Bitcraze

How to make ai-deck run int8-bit model inference but output dequantization by default. #1592

SUJUSTDO Oct 14, 2024

Replies: 2 comments · 3 replies

gemenerik Oct 14, 2024 Maintainer

SUJUSTDO Oct 15, 2024 Author

gemenerik Oct 16, 2024 Maintainer

SUJUSTDO Oct 17, 2024 Author

gemenerik Oct 18, 2024 Maintainer

SUJUSTDO
Oct 14, 2024

Replies: 2 comments 3 replies

gemenerik
Oct 14, 2024
Maintainer

SUJUSTDO Oct 15, 2024
Author

gemenerik Oct 16, 2024
Maintainer

SUJUSTDO
Oct 17, 2024
Author

gemenerik Oct 18, 2024
Maintainer