trtexec: onnx to tensorRT convertion fails with --fp16, It reported a Segmentation fault. #4111

demuxin · 2024-09-05T10:21:12Z

Description

I'm using C++ TensorRT to build on the model, but it's reporting Segmentation fault.

Then I used the trtexec command and the same error was reported:

time trtexec --onnx=model.onnx --fp16

But I can successfully build the engine without specifying the fp16 parameter.

What can be done to troubleshoot this problem?

Environment

TensorRT Version: tensorrt9.3 / tensorrt8.6

NVIDIA GPU: RTX 3090

NVIDIA Driver Version: 535.183.01

CUDA Version: 11.8

Operating System: ubuntu22

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-09-05T12:13:36Z

Add --verbose to get more info.

demuxin · 2024-09-06T01:08:57Z

@lix19937 Hi, I add --verbose, but there is nothing useful to debug too.

This is complete log file.

trtexec.log

lix19937 · 2024-09-06T01:31:24Z

Can you upload the onnx file here ?

demuxin · 2024-09-06T01:58:05Z

Yes, I can. But the onnx file is too large (about 1.5G), can you use baidu cloud?

lix19937 · 2024-09-07T08:06:34Z

@demuxin Can you upload google drive ?

demuxin · 2024-09-07T12:41:23Z

Hi, @lix19937 , this is google drive link of onnx model:

https://drive.google.com/file/d/1YyBqO0GbskV-_3Wc2ljHh5s84bH9ldV_/view?usp=sharing

lix19937 · 2024-09-07T15:52:30Z

@demuxin
I can run pass your break point

[09/07/2024-23:34:26] [V] [TRT] Tactic Name: sm80_xmma_fprop_implicit_gemm_indexed_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x64_stage3_warpsize2x2x1_g1_tensor16x8x16 Tactic: 0x866e7a5f6401b67f Time: 1.84978
[09/07/2024-23:34:26] [V] [TRT] Conv_13151 (CaskConvolution[0x80000009]) profiling completed in 0.896959 seconds. Fastest Tactic: 0xa9177bbe4e767df8 Time: 1.61924
[09/07/2024-23:34:26] [V] [TRT] --------------- Timing Runner: Conv_13151 (CaskFlattenConvolution[0x80000036])
[09/07/2024-23:34:26] [V] [TRT] CaskFlattenConvolution has no valid tactics for this config, skipping
[09/07/2024-23:34:26] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: 0xa9177bbe4e767df8
[09/07/2024-23:34:26] [V] [TRT] =============== Computing costs for {ForeignNode[onnx::Cast_19359...Slice_16473]}
[09/07/2024-23:34:26] [V] [TRT] *************** Autotuning format combination: Half(1198080,4680,90,1), Half(299520,1170,45,1), Half(4792320,18720,180,1), Half(76544,299,23,1), Half(19169280,74880,360,1) -> Bool(99749,1,1), Float(25535744,256,1), Float(6000,4,1), Float(30000,20,4,1), Float(96000,64,1), Float(96000,64,1), Float(96000,64,1), Float(96000,64,1), Float(96000,64,1), Float(96000,64,1), Float(96000,64,1), Float(96000,64,1) ***************
[09/07/2024-23:34:26] [V] [TRT] --------------- Timing Runner: {ForeignNode[onnx::Cast_19359...Slice_16473]} (Myelin[0x80000023])
[09/07/2024-23:34:44] [V] [TRT]  (foreignNode) Set user's cuda kernel library
[09/07/2024-23:34:44] [V] [TRT]  (foreignNode) Pass fuse_conv_padding is currently skipped for dynamic shapes
[09/07/2024-23:34:44] [V] [TRT]  (foreignNode) Pass pad_conv_channel is currently skipped for dynamic shapes
[09/07/2024-23:34:44] [V] [TRT]  (foreignNode) Padding large gemms
[09/07/2024-23:34:50] [V] [TRT] Tactic: 0x0000000000000000 Time: 622.841
[09/07/2024-23:34:50] [V] [TRT] {ForeignNode[onnx::Cast_19359...Slice_16473]} (Myelin[0x80000023]) profiling completed in 24.2853 seconds. Fastest Tactic: 0x0000000000000000 Time: 622.841
[09/07/2024-23:34:50] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Myelin Tactic: 0x0000000000000000
[09/07/2024-23:34:50] [V] [TRT] =============== Computing costs for PWN(Sin_16474)
[09/07/2024-23:34:50] [V] [TRT] *************** Autotuning format combination: Float(96000,64,1) -> Float(96000,64,1) ***************
[09/07/2024-23:34:50] [V] [TRT] --------------- Timing Runner: PWN(Sin_16474) (PointWiseV2[0x80000028])
[09/07/2024-23:34:50] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.00653436
[09/07/2024-23:34:50] [V] [TRT] Tactic: 0x0000000000000001 Time: 0.0166034
[09/07/2024-23:34:50] [V] [TRT] Tactic: 0x0000000000000002 Time: 0.00403073
[09/07/2024-23:34:51] [V] [TRT] Tactic: 0x0000000000000003 Time: 0.00662068
[09/07/2024-23:34:51] [V] [TRT] Tactic: 0x0000000000000004 Time: 0.0042763
[09/07/2024-23:34:51] [V] [TRT] Tactic: 0x0000000000000005 Time: 0.00272336
[09/07/2024-23:34:51] [V] [TRT] Tactic: 0x0000000000000006 Time: 0.00682754
[09/07/2024-23:34:51] [V] [TRT] Tactic: 0x0000000000000007 Time: 0.00331127
[09/07/2024-23:34:51] [V] [TRT] Tactic: 0x0000000000000008 Time: 0.00405803
[09/07/2024-23:34:52] [V] [TRT] Tactic: 0x0000000000000009 Time: 0.009352
[09/07/2024-23:34:52] [V] [TRT] Tactic: 0x000000000000001c Time: 0.0106684
[09/07/2024-23:34:52] [V] [TRT] PWN(Sin_16474) (PointWiseV2[0x80000028]) profiling completed in 1.90085 seconds. Fastest Tactic: 0x0000000000000005 Time: 0.00272336
[09/07/2024-23:34:52] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: PointWiseV2 Tactic: 0x0000000000000005
[09/07/2024-23:34:52] [V] [TRT] *************** Autotuning format combination: Float(1,64,1) -> Float(1,64,1) ***************
[09/07/2024-23:34:52] [V] [TRT] --------------- Timing Runner: PWN(Sin_16474) (PointWiseV2[0x80000028])

But when continue build, it raise error as follow

[09/07/2024-23:36:08] [V] [TRT] Adding reformat layer: Reformatted Input Tensor 3 to {ForeignNode[(Unnamed Layer* 6844) [ElementWise]...Concat_20562]} (onnx::Expand_24379) from Half(2,1) to Float(2,1)
[09/07/2024-23:36:08] [V] [TRT] Formats and tactics selection completed in 404.959 seconds.
[09/07/2024-23:36:08] [V] [TRT] After reformat layers: 369 layers
[09/07/2024-23:36:08] [V] [TRT] Total number of blocks in pre-optimized block assignment: 463
[09/07/2024-23:36:08] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[09/07/2024-23:36:11] [V] [TRT] Deleting timing cache: 2775 entries, served 8732 hits since creation.
[09/07/2024-23:36:11] [E] Error[1]: Unexpected exception vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)
[09/07/2024-23:36:11] [E] Engine could not be created from network
[09/07/2024-23:36:11] [E] Building engine failed
[09/07/2024-23:36:11] [E] Failed to create engine from model or file.
[09/07/2024-23:36:11] [E] Engine set up failed

My TensorRT version is v8601, about Unexpected exception vector::_M_range_check: __n (which is 1) >= this->size() (which is 1) i think is a bug of trt, maybe trt v10 fixed.

Your Segmentation fault maybe due to memory not enough, you can closed other user processes/tasks when build engine.

demuxin · 2024-09-08T01:35:41Z

Thank you for your prompt reply.

I've confirmed that I'm building the engine on RTX 3090, and there's plenty of memory left, so it shouldn't be out of memory issue. It might also a bug of tensorrt.

I tried again with TensorRT 10.3 and it can build engine successfully, but the model output are completely different compared to the fp32 mode.

Do you have any better suggestions for troubleshooting this problem?

lix19937 · 2024-09-08T02:25:44Z

Do you have any better suggestions for troubleshooting this problem?

Can you upload the log ?

demuxin · 2024-09-08T02:58:58Z

I inference successfully using C++ tensorrt api without any error message.

what log do you want me to upload?

lix19937 · 2024-09-08T14:03:19Z

I tried again with TensorRT 10.3 and it can build engine successfully, but the model output are completely different compared to the fp32 mode.

The log of trtexec --fp16 --verbose build .

demuxin · 2024-09-08T14:43:55Z

This is complete trtexec log file on TensorRT 10.3.

trtexec_fp16_v103.log

lix19937 · 2024-09-08T15:11:45Z

Your model has layernorm layer after self-attention, which overflow in fp16, so you should set layernorm in fp32.

demuxin · 2024-09-08T23:59:55Z

Thank you for working so hard.

I'm using netron to visualize onnx model and no layernorm layer is found, do you know what's going on?

and do you know how to set layernoram layer individually to fp32 using tensorrt C++ api and trtexec?

lix19937 · 2024-09-09T01:01:55Z

You can ref trtexec --help set layernoram layer individually to fp32

  --precisionConstraints=spec Control precision constraint setting. (default = none)
                                  Precision Constaints: spec ::= "none" | "obey" | "prefer"
                                  none = no constraints
                                  prefer = meet precision constraints set by --layerPrecisions/--layerOutputTypes if possible
                                  obey = meet precision constraints set by --layerPrecisions/--layerOutputTypes or fail
                                         otherwise
  --layerPrecisions=spec      Control per-layer precision constraints. Effective only when precisionConstraints is set to
                              "obey" or "prefer". (default = none)
                              The specs are read left-to-right, and later ones override earlier ones. "*" can be used as a
                              layerName to specify the default precision for all the unspecified layers.
                              Per-layer precision spec ::= layerPrecision[","spec]
                                                  layerPrecision ::= layerName":"precision
                                                  precision ::= "fp32"|"fp16"|"int32"|"int8"
  --layerOutputTypes=spec

I'm using netron to visualize onnx model and no layernorm layer is found, do you know what's going on?

trt will fusion ln struct nodes.

demuxin · 2024-09-10T03:30:09Z

Hi @lix19937 , I used this command to build tensorrt engine:

trtexec --onnx=codetr_sim.onnx --fp16 --verbose \
    --precisionConstraints=obey \
    --layerPrecisions=layernorm:fp32 \
    --layerOutputTypes=layernorm:fp32

But there is still the following warning:

[09/10/2024-03:23:17] [W] [TRT] Detected layernorm nodes in FP16
[09/10/2024-03:23:17] [W] [TRT] Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy.

this is build log file:
trtexec_fp16_v103.log

It seems that the setting was not successful, How should I set it up?

Moreover do you know how to set layernoram layer individually to fp32 using tensorrt C++ api ?

demuxin · 2024-09-11T06:47:48Z

I update torch to 1.13, and export onnx with opset 17, this issue can be solved.

jinhonglu · 2025-01-23T06:34:05Z

I had successfully converted fp32 engine from onnx model.

I tried the suggestion above to add --verbose for the fp16 conversion.

Within the log, there is no overflow logging and the result is different compared to the fp32 engine.

Any further investigation?

trt_fp16.log

lix19937 · 2025-01-23T07:56:27Z

Try to use follow to see the diff

    polygraphy run  $spec_onnx --onnxrt --trt

    polygraphy run  $spec_onnx --onnxrt --trt --fp16

jinhonglu · 2025-01-23T08:15:21Z

by running polygraphy run $spec_onnx --onnxrt --trt

[I]         onnxrt-runner-N0-01/23/25-16:11:45: output | Stats: mean=0.0016378, std-dev=0.0089018, var=7.9242e-05, median=0.00059201, min=-0.2853 at (0, 0, 267, 988, 0), max=0.4366 at (0, 0, 268, 968, 0), avg-magnitude=0.0031753, p90=0.004987, p95=0.004987, p99=0.034112
[I]             ---- Histogram ----
                Bin Range          |  Num Elems | Visualization
                (-0.285 , -0.213 ) |          7 | 
                (-0.213 , -0.141 ) |         82 | 
                (-0.141 , -0.0687) |       2291 | 
                (-0.0687, 0.00353) |    3642282 | ########################################
                (0.00353, 0.0757 ) |     585707 | ######
                (0.0757 , 0.148  ) |       8049 | 
                (0.148  , 0.22   ) |        865 | 
                (0.22   , 0.292  ) |        104 | 
                (0.292  , 0.365  ) |          9 | 
                (0.365  , 0.437  ) |          4 | 
[I]         trt-runner-N0-01/23/25-16:11:45: output | Stats: mean=0.0016377, std-dev=0.0089008, var=7.9224e-05, median=0.00059199, min=-0.28508 at (0, 0, 267, 988, 0), max=0.43679 at (0, 0, 268, 968, 0), avg-magnitude=0.0031751, p90=0.0049865, p95=0.0049865, p99=0.034105
[I]             ---- Histogram ----
                Bin Range          |  Num Elems | Visualization
                (-0.285 , -0.213 ) |          7 | 
                (-0.213 , -0.141 ) |         80 | 
                (-0.141 , -0.0687) |       2288 | 
                (-0.0687, 0.00353) |    3642283 | ########################################
                (0.00353, 0.0757 ) |     585718 | ######
                (0.0757 , 0.148  ) |       8044 | 
                (0.148  , 0.22   ) |        864 | 
                (0.22   , 0.292  ) |        103 | 
                (0.292  , 0.365  ) |          9 | 
                (0.365  , 0.437  ) |          4 |

by running for polygraphy run $spec_onnx --onnxrt --trt --fp16

[I]         onnxrt-runner-N0-01/23/25-16:07:05: output | Stats: mean=0.0016378, std-dev=0.0089018, var=7.9242e-05, median=0.00059201, min=-0.2853 at (0, 0, 267, 988, 0), max=0.4366 at (0, 0, 268, 968, 0), avg-magnitude=0.0031753, p90=0.004987, p95=0.004987, p99=0.034112
[I]             ---- Histogram ----
                Bin Range          |  Num Elems | Visualization
                (-0.285 , -0.14  ) |         90 | 
                (-0.14  , 0.00457) |    3776629 | ########################################
                (0.00457, 0.15   ) |     461750 | ####
                (0.15   , 0.294  ) |        919 | 
                (0.294  , 0.439  ) |         12 | 
                (0.439  , 0.584  ) |          0 | 
                (0.584  , 0.729  ) |          0 | 
                (0.729  , 0.874  ) |          0 | 
                (0.874  , 1.02   ) |          0 | 
                (1.02   , 1.16   ) |          0 | 
[I]         trt-runner-N0-01/23/25-16:07:05: output | Stats: mean=0.38736, std-dev=0.38967, var=0.15184, median=0.2944, min=-0.07782 at (0, 0, 1261, 271, 1), max=1.1641 at (0, 0, 2048, 887, 0), avg-magnitude=0.39337, p90=0.81445, p95=0.81445, p99=0.85742
[I]             ---- Histogram ----
                Bin Range          |  Num Elems | Visualization
                (-0.285 , -0.14  ) |          0 | 
                (-0.14  , 0.00457) |    1405988 | ################################
                (0.00457, 0.15   ) |     713712 | ################
                (0.15   , 0.294  ) |          0 | 
                (0.294  , 0.439  ) |          0 | 
                (0.439  , 0.584  ) |       2068 | 
                (0.584  , 0.729  ) |     382453 | ########
                (0.729  , 0.874  ) |    1722631 | ########################################
                (0.874  , 1.02   ) |       9447 | 
                (1.02   , 1.16   ) |       3101 |

I still can't see which layer produces differently.

I tried --trt-outputs mark all --onnx-outputs mark all
However, polygraphy throws an error that Mismatched type for tensor ONNXTRT_Broadcast_3477_output', f16 vs. expected type:f32.

moraxu added Precision: FP16 triaged Issue has been triaged by maintainers labels Sep 7, 2024

demuxin closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trtexec: onnx to tensorRT convertion fails with --fp16, It reported a Segmentation fault. #4111

trtexec: onnx to tensorRT convertion fails with --fp16, It reported a Segmentation fault. #4111

demuxin commented Sep 5, 2024

lix19937 commented Sep 5, 2024

demuxin commented Sep 6, 2024

lix19937 commented Sep 6, 2024

demuxin commented Sep 6, 2024 •

edited

Loading

lix19937 commented Sep 7, 2024

demuxin commented Sep 7, 2024

lix19937 commented Sep 7, 2024

demuxin commented Sep 8, 2024

lix19937 commented Sep 8, 2024

demuxin commented Sep 8, 2024

lix19937 commented Sep 8, 2024

demuxin commented Sep 8, 2024

lix19937 commented Sep 8, 2024

demuxin commented Sep 8, 2024

lix19937 commented Sep 9, 2024

demuxin commented Sep 10, 2024

demuxin commented Sep 11, 2024

jinhonglu commented Jan 23, 2025

lix19937 commented Jan 23, 2025

jinhonglu commented Jan 23, 2025 •

edited

Loading

trtexec: onnx to tensorRT convertion fails with --fp16, It reported a Segmentation fault. #4111

trtexec: onnx to tensorRT convertion fails with --fp16, It reported a Segmentation fault. #4111

Comments

demuxin commented Sep 5, 2024

Description

Environment

lix19937 commented Sep 5, 2024

demuxin commented Sep 6, 2024

lix19937 commented Sep 6, 2024

demuxin commented Sep 6, 2024 • edited Loading

lix19937 commented Sep 7, 2024

demuxin commented Sep 7, 2024

lix19937 commented Sep 7, 2024

demuxin commented Sep 8, 2024

lix19937 commented Sep 8, 2024

demuxin commented Sep 8, 2024

lix19937 commented Sep 8, 2024

demuxin commented Sep 8, 2024

lix19937 commented Sep 8, 2024

demuxin commented Sep 8, 2024

lix19937 commented Sep 9, 2024

demuxin commented Sep 10, 2024

demuxin commented Sep 11, 2024

jinhonglu commented Jan 23, 2025

lix19937 commented Jan 23, 2025

jinhonglu commented Jan 23, 2025 • edited Loading

demuxin commented Sep 6, 2024 •

edited

Loading

jinhonglu commented Jan 23, 2025 •

edited

Loading