Polygraphy [HostToDeviceCopy]requires bool I/O but node can not be handled by Myelin. #1698

Slyne · 2021-12-30T06:05:55Z

Description

polygraphy run encoder.onnx --trt --onnxrt  --trt-outputs mark all     --onnx-outputs mark all --fail-fast

Environment

TensorRT Version: 8.2.1
NVIDIA GPU: V100
NVIDIA Driver Version:
CUDA Version: 11.4
CUDNN Version:
Operating System: ubuntu
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 1.10
Baremetal or Container (if so, version):

Relevant Files

Model: https://nvidia-my.sharepoint.com/:u:/p/slyned/ESC-DizzoztLhNsFrQAW_F4BRwGMj7YZAYwteGhvOxxi3A?e=A2Gsb1
Please use nvidia account to get access to it.

Steps To Reproduce

polygraphy run encoder.onnx --trt --onnxrt  --trt-outputs mark all     --onnx-outputs mark all --fail-fast

The text was updated successfully, but these errors were encountered:

zerollzeng · 2021-12-31T14:53:08Z

Can you run the model with trtexec and attach the log? a log with --verbose is preferred.

Slyne · 2022-01-04T06:54:21Z

@zerollzeng
log.txt

zerollzeng · 2022-01-04T08:52:54Z

Are you getting errors when running with polygraphy? your trtexec log seems running successfully. try running with the same config as you did with trtexec and don't mark any outputs.

Slyne · 2022-01-05T04:19:05Z

Are you getting errors when running with polygraphy? your trtexec log seems running successfully. try running with the same config as you did with trtexec and don't mark any outputs.

Yes, I only get errors when running with polygraphy. Running it without mark is ok. But I need to find the tensors that are not right compared with onnx model.

zerollzeng · 2022-01-05T05:08:03Z

Marking all tensors as output will break layer fusion hence may affect final output, I recommend making output with bisect until you find the problematic layer.

Slyne · 2022-01-25T03:21:47Z

Marking all tensors as output will break layer fusion hence may affect final output, I recommend making output with bisect until you find the problematic layer.

Thank you! I finally find the issue layers. Will file another bug for the accuraccy problem.

hhhhhanxu · 2022-09-21T14:52:40Z

Marking all tensors as output will break layer fusion hence may affect final output, I recommend making output with bisect until you find the problematic layer.

Hello,I want to know how to make output with bisect ? Can I directly use polygraphy or use onnx-graphsurgeon to modify the .onnx ?

safehumeng · 2022-09-27T07:06:26Z

Marking all tensors as output will break layer fusion hence may affect final output, I recommend making output with bisect until you find the problematic layer.

Hello,I want to know how to make output with bisect ? Can I directly use polygraphy or use onnx-graphsurgeon to modify the .onnx ?

I got the way from google
polygraphy debug precision net_bs8.onnx --fp16 --tactic-sources cublas --check polygraphy run polygraphy_debug.engine --trt --load-outputs onnx_res.json --abs 1e-1

hhhhhanxu · 2022-09-27T10:02:13Z

Thank you!在 2022年9月27日，15:06，safehumeng ***@***.***> 写道： Marking all tensors as output will break layer fusion hence may affect final output, I recommend making output with bisect until you find the problematic layer. Hello,I want to know how to make output with bisect ? Can I directly use polygraphy or use onnx-graphsurgeon to modify the .onnx ? I got the way from google polygraphy debug precision net_bs8.onnx --fp16 --tactic-sources cublas --check polygraphy run polygraphy_debug.engine --trt --load-outputs onnx_res.json --abs 1e-1 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

proevgenii · 2023-10-23T16:36:35Z

@zerollzeng, How can I get difference for all layers?
When I try to compare layer-by-layer with command

polygraphy debug precision\
 --mode bisect\
/workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx\
--fp16\
--verbose\
-p float16\
--check polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json\
--abs 0

It detects a big difference in the output layer, which is logical and understandable, but why doesn't it go on to check the other layers?

Log

[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] RUNNING | Command: /usr/local/bin/polygraphy debug precision --mode bisect /workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx --fp16 --verbose -p float16 --check polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[V] Loaded Module: polygraphy | Version: 0.47.1 | Path: ['/usr/local/lib/python3.10/dist-packages/polygraphy']
[V] Loaded Module: tensorrt | Version: 8.6.1 | Path: ['/usr/local/lib/python3.10/dist-packages/tensorrt']
[V] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 24, GPU 1926 (MiB)
[V] [MemUsageChange] Init builder kernel library: CPU +888, GPU +174, now: CPU 989, GPU 2100 (MiB)
[V] ----------------------------------------------------------------
[V] Input filename:   /workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx
[V] ONNX IR version:  0.0.8
[V] Opset version:    16
[V] Producer name:    pytorch
[V] Producer version: 2.1.0
[V] Domain:           
[V] Model version:    0
[V] Doc string:       
[V] ----------------------------------------------------------------
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] Using float16 as the higher precision, but float16 is also the lowest precision available. Did you mean to set --int8 as well?
[I] Using DataType.HALF as higher precision
[I]     RUNNING | Iteration 1 | Approximately 11 iteration(s) remaining
[I]     Selecting first 1430 layer(s) to run in higher precision
[V]     Loaded Module: polygraphy.backend.trt.util
[V]     Loaded Module: numpy | Version: 1.22.2 | Path: ['/usr/local/lib/python3.10/dist-packages/numpy']
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1080, 1081, 1083, 1084, 1090, 1091, 1093, 1094, 1100, 1101, 1107, 1111, 1115, 1119, 1123, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1135, 1136, 1139, 1141, 1142, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1157, 1158, 1160, 1161, 1163, 1164, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1177, 1178, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1193, 1194, 1196, 1197, 1203, 1204, 1206, 1207, 1213, 1214, 1220, 1224, 1228, 1232, 1236, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249, 1252, 1254, 1255, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1270, 1271, 1273, 1274, 1276, 1277, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1290, 1291, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1306, 1307, 1309, 1310, 1316, 1317, 1319, 1320, 1326, 1327, 1333, 1337, 1341, 1345, 1349, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1361, 1362, 1365, 1367, 1368, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1383, 1384, 1386, 1387, 1389, 1390, 1392, 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1403, 1404, 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1419, 1420, 1422, 1423, 1424, 1426, 1428, 1429} to run in DataType.HALF precision
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[W]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | No shapes provided; Will use shape: [1, 3, 224, 224] for min/opt/max in profile.
[W]         This will cause the tensor to have a static shape. If this is incorrect, please set the range of shapes for this input tensor.
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [FP16, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[V]     Graph optimization time: 0.118371 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 5328
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 1478656
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 166 MiB, GPU 166 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 3 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 0.036472ms to assign 2 blocks to 3 nodes requiring 1553920 bytes.
[V]     Total Activation Memory: 1553920
[W]     TensorRT encountered issues when converting weights between types and that could affect accuracy.
[W]     If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[W]     Check verbose logs for the list of affected weights.
[W]     - 85 weights are affected by this issue: Detected subnormal FP16 values.
[W]     - 49 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +4, GPU +175, now: CPU 4, GPU 175 (MiB)
[I]     Finished engine building in 51.889 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     ========== CAPTURED STDOUT ==========
        [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
        [I] trt-runner-N0-10/23/23-16:32:19     | Activating and starting inference
        [I] Loading bytes from /workspace/img_clf/notebooks/polygraphy_debug.engine
        [I] trt-runner-N0-10/23/23-16:32:19    
            ---- Inference Input(s) ----
            {input [dtype=float32, shape=(1, 3, 224, 224)]}
        [I] trt-runner-N0-10/23/23-16:32:19    
            ---- Inference Output(s) ----
            {output [dtype=float32, shape=(1, 8)]}
        [I] trt-runner-N0-10/23/23-16:32:19     | Completed 1 iteration(s) in 5.605 ms | Average inference time: 5.605 ms.
        [I] Loading inference results from /workspace/img_clf/data/layerwise_golden.json
        [I] Accuracy Comparison | trt-runner-N0-10/23/23-16:32:19 vs. onnxrt-runner-N0-10/23/23-15:45:37
        [I]     Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))
        [I]         Tolerance: [abs=0, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-10/23/23-16:32:19: output | Stats: mean=-0.98755, std-dev=0.94736, var=0.89749, median=-0.8186, min=-2.4531 at (0, 4), max=0.92334 at (0, 2), avg-magnitude=1.2184
        [I]             ---- Values ----
                            [[-0.6977539  -0.72753906  0.92333984 -1.3388672  -2.453125   -1.9960938
                              -0.7006836  -0.90966797]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          0 | 
                        (-7.63 , -6.65 ) |          0 | 
                        (-6.65 , -5.66 ) |          0 | 
                        (-5.66 , -4.67 ) |          0 | 
                        (-4.67 , -3.68 ) |          0 | 
                        (-3.68 , -2.69 ) |          0 | 
                        (-2.69 , -1.7  ) |          2 | ##########################
                        (-1.7  , -0.714) |          3 | ########################################
                        (-0.714, 0.274 ) |          2 | ##########################
                        (0.274 , 1.26  ) |          1 | #############
        [I]         onnxrt-runner-N0-10/23/23-15:45:37: output | Stats: mean=-4.5612, std-dev=3.0662, var=9.4016, median=-5.3557, min=-8.6229 at (0, 1), max=1.2628 at (0, 0), avg-magnitude=4.8769
        [I]             ---- Values ----
                            [[ 1.2627729 -8.622935  -1.5289186 -4.499915  -7.0470834 -6.714871
                              -3.1271095 -6.211514 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          1 | ####################
                        (-7.63 , -6.65 ) |          2 | ########################################
                        (-6.65 , -5.66 ) |          1 | ####################
                        (-5.66 , -4.67 ) |          0 | 
                        (-4.67 , -3.68 ) |          1 | ####################
                        (-3.68 , -2.69 ) |          1 | ####################
                        (-2.69 , -1.7  ) |          0 | 
                        (-1.7  , -0.714) |          1 | ####################
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         Error Metrics: output
        [I]             Minimum Required Tolerance: elemwise error | [abs=7.8954] OR [rel=1.6039] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=4.0638, std-dev=1.8521, var=3.4301, median=3.8775, min=1.9605 at (0, 0), max=7.8954 at (0, 1), avg-magnitude=4.0638
        [I]                 ---- Values ----
                                [[1.9605268 7.895396  2.4522586 3.161048  4.5939584 4.718777  2.426426
                                  5.301846 ]]
        [I]                 ---- Histogram ----
                            Bin Range    |  Num Elems | Visualization
                            (1.96, 2.55) |          3 | ########################################
                            (2.55, 3.15) |          0 | 
                            (3.15, 3.74) |          1 | #############
                            (3.74, 4.33) |          0 | 
                            (4.33, 4.93) |          2 | ##########################
                            (4.93, 5.52) |          1 | #############
                            (5.52, 6.11) |          0 | 
                            (6.11, 6.71) |          0 | 
                            (6.71, 7.3 ) |          0 | 
                            (7.3 , 7.9 ) |          1 | #############
        [I]             Relative Difference | Stats: mean=0.96984, std-dev=0.36049, var=0.12995, median=0.81474, min=0.65189 at (0, 4), max=1.6039 at (0, 2), avg-magnitude=0.96984
        [I]                 ---- Values ----
                                [[1.5525569  0.9156274  1.603917   0.70246834 0.651895   0.70273536
                                  0.7759325  0.8535513 ]]
        [I]                 ---- Histogram ----
                            Bin Range      |  Num Elems | Visualization
                            (0.652, 0.747) |          3 | ########################################
                            (0.747, 0.842) |          1 | #############
                            (0.842, 0.938) |          2 | ##########################
                            (0.938, 1.03 ) |          0 | 
                            (1.03 , 1.13 ) |          0 | 
                            (1.13 , 1.22 ) |          0 | 
                            (1.22 , 1.32 ) |          0 | 
                            (1.32 , 1.41 ) |          0 | 
                            (1.41 , 1.51 ) |          0 | 
                            (1.51 , 1.6  ) |          2 | ##########################
        [E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=0)
        [E]     FAILED | Mismatched outputs: ['output']
        [E] Accuracy Summary | trt-runner-N0-10/23/23-16:32:19 vs. onnxrt-runner-N0-10/23/23-15:45:37 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 3.012s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[E]     ========== CAPTURED STDERR ==========
[I]     Saving debug replay to polygraphy_debug_replay.json
[E]     FAILED | Iteration 1 | Duration 55.53689908981323s
[E]     Could not find a configuration that satisfied accuracy requirements.
[I] Finished 1 iteration(s) | Passed: 0/1 | Pass Rate: 0.0%
[I] PASSED | Runtime: 66.191s | Command: /usr/local/bin/polygraphy debug precision --mode bisect /workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx --fp16 --verbose -p float16 --check polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0

When I set precision to int8, I also get diff only for 'output' layer, and I don't why it tell that precision of both float32?
[I] Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))

Command:

!polygraphy debug precision\
 --mode bisect\
/workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx\
--int8\
--verbose\
-p float32\
--check polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json\
--abs 0

Log

[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] RUNNING | Command: /usr/local/bin/polygraphy debug precision --mode bisect /workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx --int8 --verbose -p float32 --check polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[V] Loaded Module: polygraphy | Version: 0.47.1 | Path: ['/usr/local/lib/python3.10/dist-packages/polygraphy']
[V] Loaded Module: tensorrt | Version: 8.6.1 | Path: ['/usr/local/lib/python3.10/dist-packages/tensorrt']
[V] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 24, GPU 1926 (MiB)
[V] [MemUsageChange] Init builder kernel library: CPU +888, GPU +174, now: CPU 989, GPU 2100 (MiB)
[V] ----------------------------------------------------------------
[V] Input filename:   /workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx
[V] ONNX IR version:  0.0.8
[V] Opset version:    16
[V] Producer name:    pytorch
[V] Producer version: 2.1.0
[V] Domain:           
[V] Model version:    0
[V] Doc string:       
[V] ----------------------------------------------------------------
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I] Using DataType.FLOAT as higher precision
[I]     RUNNING | Iteration 1 | Approximately 11 iteration(s) remaining
[I]     Selecting first 1430 layer(s) to run in higher precision
[V]     Loaded Module: polygraphy.backend.trt.util
[V]     Loaded Module: numpy | Version: 1.22.2 | Path: ['/usr/local/lib/python3.10/dist-packages/numpy']
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1080, 1081, 1083, 1084, 1090, 1091, 1093, 1094, 1100, 1101, 1107, 1111, 1115, 1119, 1123, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1135, 1136, 1139, 1141, 1142, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1157, 1158, 1160, 1161, 1163, 1164, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1177, 1178, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1193, 1194, 1196, 1197, 1203, 1204, 1206, 1207, 1213, 1214, 1220, 1224, 1228, 1232, 1236, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249, 1252, 1254, 1255, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1270, 1271, 1273, 1274, 1276, 1277, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1290, 1291, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1306, 1307, 1309, 1310, 1316, 1317, 1319, 1320, 1326, 1327, 1333, 1337, 1341, 1345, 1349, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1361, 1362, 1365, 1367, 1368, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1383, 1384, 1386, 1387, 1389, 1390, 1392, 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1403, 1404, 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1419, 1420, 1422, 1423, 1424, 1426, 1428, 1429} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[W]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | No shapes provided; Will use shape: [1, 3, 224, 224] for min/opt/max in profile.
[W]         This will cause the tensor to have a static shape. If this is incorrect, please set the range of shapes for this input tensor.
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0457923 seconds.
[W]     BuilderFlag::kENABLE_TACTIC_HEURISTIC has been ignored in this builder run. This feature is only supported on Ampere and beyond.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 343 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1574.7ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Input tensor: input | Generating input data in range: [0.0, 1.0]
[V]     Found candidate CUDA libraries: ['/usr/local/cuda/lib64/libcudart.so', '/usr/local/cuda/lib64/libcudart.so.12', '/usr/local/cuda/lib64/libcudart.so.12.2.128']
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1496604400]
[V]       Calibrated batch 0 in 0.162968 seconds.
[V]       Post Processing Calibration data in 150.214 seconds.
[V]     Calibration completed in 155.589 seconds.
[V]     Writing Calibration Cache for calibrator: TRT-8601-EntropyCalibration2
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 8) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 29) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 33) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 109) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.0/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 148) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 152) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 156) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.1/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.2/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.3/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.4/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.5/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.6/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.7/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.8/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.9/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.10/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.11/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[V]     Graph optimization time: 1.36354 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 368320
[V]     Total Device Persistent Memory: 328704
[V]     Total Scratch Memory: 154112
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 333 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 480 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 15.7391ms to assign 7 blocks to 480 nodes requiring 1690624 bytes.
[V]     Total Activation Memory: 1689600
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +324, GPU +335, now: CPU 324, GPU 335 (MiB)
[I]     Finished engine building in 196.464 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     Saving debug replay to polygraphy_debug_replay.json
[I]     PASSED | Iteration 1 | Duration 200.7176752090454s
[I]     RUNNING | Iteration 2 | Approximately 11 iteration(s) remaining
[I]     Selecting first 715 layer(s) to run in higher precision
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0477532 seconds.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 333 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1629.1ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1490604400]
[V]       Calibrated batch 0 in 0.164016 seconds.
[V]       Post Processing Calibration data in 156.685 seconds.
[V]     Calibration completed in 161.148 seconds.
[V]     Writing Calibration Cache for calibrator: TRT-8601-EntropyCalibration2
[V]     Graph optimization time: 1.53557 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 378800
[V]     Total Device Persistent Memory: 184320
[V]     Total Scratch Memory: 154112
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 511 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 24.498ms to assign 7 blocks to 511 nodes requiring 1690624 bytes.
[V]     Total Activation Memory: 1689600
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +213, GPU +224, now: CPU 213, GPU 224 (MiB)
[I]     Finished engine building in 288.108 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     ========== CAPTURED STDOUT ==========
        [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
        [I] trt-runner-N0-10/23/23-17:01:58     | Activating and starting inference
        [I] Loading bytes from /workspace/img_clf/notebooks/polygraphy_debug.engine
        [I] trt-runner-N0-10/23/23-17:01:58    
            ---- Inference Input(s) ----
            {input [dtype=float32, shape=(1, 3, 224, 224)]}
        [I] trt-runner-N0-10/23/23-17:01:58    
            ---- Inference Output(s) ----
            {output [dtype=float32, shape=(1, 8)]}
        [I] trt-runner-N0-10/23/23-17:01:58     | Completed 1 iteration(s) in 665.5 ms | Average inference time: 665.5 ms.
        [I] Loading inference results from /workspace/img_clf/data/layerwise_golden.json
        [I] Accuracy Comparison | trt-runner-N0-10/23/23-17:01:58 vs. onnxrt-runner-N0-10/23/23-15:45:37
        [I]     Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))
        [I]         Tolerance: [abs=0, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-10/23/23-17:01:58: output | Stats: mean=-3.7136, std-dev=2.2259, var=4.9544, median=-4.0574, min=-6.6405 at (0, 1), max=1.0144 at (0, 0), avg-magnitude=3.9673
        [I]             ---- Values ----
                            [[ 1.014436  -6.640457  -3.1188476 -4.164866  -5.4636207 -3.9500134
                              -2.105751  -5.2800465]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          0 | 
                        (-7.63 , -6.65 ) |          0 | 
                        (-6.65 , -5.66 ) |          1 | ####################
                        (-5.66 , -4.67 ) |          2 | ########################################
                        (-4.67 , -3.68 ) |          2 | ########################################
                        (-3.68 , -2.69 ) |          1 | ####################
                        (-2.69 , -1.7  ) |          1 | ####################
                        (-1.7  , -0.714) |          0 | 
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         onnxrt-runner-N0-10/23/23-15:45:37: output | Stats: mean=-4.5612, std-dev=3.0662, var=9.4016, median=-5.3557, min=-8.6229 at (0, 1), max=1.2628 at (0, 0), avg-magnitude=4.8769
        [I]             ---- Values ----
                            [[ 1.2627729 -8.622935  -1.5289186 -4.499915  -7.0470834 -6.714871
                              -3.1271095 -6.211514 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          1 | ####################
                        (-7.63 , -6.65 ) |          2 | ########################################
                        (-6.65 , -5.66 ) |          1 | ####################
                        (-5.66 , -4.67 ) |          0 | 
                        (-4.67 , -3.68 ) |          1 | ####################
                        (-3.68 , -2.69 ) |          1 | ####################
                        (-2.69 , -1.7  ) |          0 | 
                        (-1.7  , -0.714) |          1 | ####################
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         Error Metrics: output
        [I]             Minimum Required Tolerance: elemwise error | [abs=2.7649] OR [rel=1.0399] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=1.3071, std-dev=0.79264, var=0.62827, median=1.3024, min=0.24834 at (0, 0), max=2.7649 at (0, 5), avg-magnitude=1.3071
        [I]                 ---- Values ----
                                [[0.24833691 1.9824781  1.589929   0.33504915 1.5834627  2.7648575
                                  1.0213585  0.93146753]]
        [I]                 ---- Histogram ----
                            Bin Range      |  Num Elems | Visualization
                            (0.248, 0.5  ) |          2 | ########################################
                            (0.5  , 0.752) |          0 | 
                            (0.752, 1    ) |          1 | ####################
                            (1    , 1.25 ) |          1 | ####################
                            (1.25 , 1.51 ) |          0 | 
                            (1.51 , 1.76 ) |          2 | ########################################
                            (1.76 , 2.01 ) |          1 | ####################
                            (2.01 , 2.26 ) |          0 | 
                            (2.26 , 2.51 ) |          0 | 
                            (2.51 , 2.76 ) |          1 | ####################
        [I]             Relative Difference | Stats: mean=0.33174, std-dev=0.28444, var=0.080905, median=0.2273, min=0.074457 at (0, 3), max=1.0399 at (0, 2), avg-magnitude=0.33174
        [I]                 ---- Values ----
                                [[0.19666    0.22990757 1.0399042  0.07445677 0.2246976  0.4117514
                                  0.32661423 0.14995821]]
        [I]                 ---- Histogram ----
                            Bin Range       |  Num Elems | Visualization
                            (0.0745, 0.171) |          2 | ##########################
                            (0.171 , 0.268) |          3 | ########################################
                            (0.268 , 0.364) |          1 | #############
                            (0.364 , 0.461) |          1 | #############
                            (0.461 , 0.557) |          0 | 
                            (0.557 , 0.654) |          0 | 
                            (0.654 , 0.75 ) |          0 | 
                            (0.75  , 0.847) |          0 | 
                            (0.847 , 0.943) |          0 | 
                            (0.943 , 1.04 ) |          1 | #############
        [E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=0)
        [E]     FAILED | Mismatched outputs: ['output']
        [E] Accuracy Summary | trt-runner-N0-10/23/23-17:01:58 vs. onnxrt-runner-N0-10/23/23-15:45:37 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 3.511s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[E]     ========== CAPTURED STDERR ==========
[I]     Saving debug replay to polygraphy_debug_replay.json
[E]     FAILED | Iteration 2 | Duration 292.11703515052795s
[I]     RUNNING | Iteration 3 | Approximately 10 iteration(s) remaining
[I]     Selecting first 1073 layer(s) to run in higher precision
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0484175 seconds.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1640.19ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1490604400]
[V]       Calibrated batch 0 in 0.166882 seconds.
[V]       Post Processing Calibration data in 163.59 seconds.
[V]     Calibration completed in 168.106 seconds.
[V]     Writing Calibration Cache for calibrator: TRT-8601-EntropyCalibration2
[V]     Graph optimization time: 1.48535 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 371264
[V]     Total Device Persistent Memory: 176640
[V]     Total Scratch Memory: 154112
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 511 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 19.0428ms to assign 7 blocks to 511 nodes requiring 1690624 bytes.
[V]     Total Activation Memory: 1689600
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +268, GPU +279, now: CPU 268, GPU 279 (MiB)
[I]     Finished engine building in 195.951 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     ========== CAPTURED STDOUT ==========
        [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
        [I] trt-runner-N0-10/23/23-17:05:18     | Activating and starting inference
        [I] Loading bytes from /workspace/img_clf/notebooks/polygraphy_debug.engine
        [I] trt-runner-N0-10/23/23-17:05:18    
            ---- Inference Input(s) ----
            {input [dtype=float32, shape=(1, 3, 224, 224)]}
        [I] trt-runner-N0-10/23/23-17:05:18    
            ---- Inference Output(s) ----
            {output [dtype=float32, shape=(1, 8)]}
        [I] trt-runner-N0-10/23/23-17:05:18     | Completed 1 iteration(s) in 618.5 ms | Average inference time: 618.5 ms.
        [I] Loading inference results from /workspace/img_clf/data/layerwise_golden.json
        [I] Accuracy Comparison | trt-runner-N0-10/23/23-17:05:18 vs. onnxrt-runner-N0-10/23/23-15:45:37
        [I]     Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))
        [I]         Tolerance: [abs=0, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-10/23/23-17:05:18: output | Stats: mean=-4.0761, std-dev=2.6041, var=6.7812, median=-5.0699, min=-7.2015 at (0, 1), max=0.57395 at (0, 0), avg-magnitude=4.2196
        [I]             ---- Values ----
                            [[ 0.5739483 -7.2014675 -1.2806717 -4.6683197 -6.3941054 -6.0034494
                              -2.1633415 -5.4715405]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          0 | 
                        (-7.63 , -6.65 ) |          1 | ####################
                        (-6.65 , -5.66 ) |          2 | ########################################
                        (-5.66 , -4.67 ) |          1 | ####################
                        (-4.67 , -3.68 ) |          1 | ####################
                        (-3.68 , -2.69 ) |          0 | 
                        (-2.69 , -1.7  ) |          1 | ####################
                        (-1.7  , -0.714) |          1 | ####################
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         onnxrt-runner-N0-10/23/23-15:45:37: output | Stats: mean=-4.5612, std-dev=3.0662, var=9.4016, median=-5.3557, min=-8.6229 at (0, 1), max=1.2628 at (0, 0), avg-magnitude=4.8769
        [I]             ---- Values ----
                            [[ 1.2627729 -8.622935  -1.5289186 -4.499915  -7.0470834 -6.714871
                              -3.1271095 -6.211514 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          1 | ####################
                        (-7.63 , -6.65 ) |          2 | ########################################
                        (-6.65 , -5.66 ) |          1 | ####################
                        (-5.66 , -4.67 ) |          0 | 
                        (-4.67 , -3.68 ) |          1 | ####################
                        (-3.68 , -2.69 ) |          1 | ####################
                        (-2.69 , -1.7  ) |          0 | 
                        (-1.7  , -0.714) |          1 | ####################
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         Error Metrics: output
        [I]             Minimum Required Tolerance: elemwise error | [abs=1.4215] OR [rel=0.54549] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=0.69939, std-dev=0.36756, var=0.1351, median=0.70012, min=0.1684 at (0, 3), max=1.4215 at (0, 1), avg-magnitude=0.69939
        [I]                 ---- Values ----
                                [[0.6888246  1.4214678  0.24824691 0.16840458 0.65297794 0.7114215
                                  0.963768   0.73997355]]
        [I]                 ---- Histogram ----
                            Bin Range      |  Num Elems | Visualization
                            (0.168, 0.294) |          2 | ##########################
                            (0.294, 0.419) |          0 | 
                            (0.419, 0.544) |          0 | 
                            (0.544, 0.67 ) |          1 | #############
                            (0.67 , 0.795) |          3 | ########################################
                            (0.795, 0.92 ) |          0 | 
                            (0.92 , 1.05 ) |          1 | #############
                            (1.05 , 1.17 ) |          0 | 
                            (1.17 , 1.3  ) |          0 | 
                            (1.3  , 1.42 ) |          1 | #############
        [I]             Relative Difference | Stats: mean=0.19201, std-dev=0.1527, var=0.023318, median=0.14075, min=0.037424 at (0, 3), max=0.54549 at (0, 0), avg-magnitude=0.19201
        [I]                 ---- Values ----
                                [[0.54548573 0.16484731 0.16236764 0.03742395 0.09265932 0.10594716
                                  0.3081977  0.11912934]]
        [I]                 ---- Histogram ----
                            Bin Range        |  Num Elems | Visualization
                            (0.0374, 0.0882) |          1 | #############
                            (0.0882, 0.139 ) |          3 | ########################################
                            (0.139 , 0.19  ) |          2 | ##########################
                            (0.19  , 0.241 ) |          0 | 
                            (0.241 , 0.291 ) |          0 | 
                            (0.291 , 0.342 ) |          1 | #############
                            (0.342 , 0.393 ) |          0 | 
                            (0.393 , 0.444 ) |          0 | 
                            (0.444 , 0.495 ) |          0 | 
                            (0.495 , 0.545 ) |          1 | #############
        [E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=0)
        [E]     FAILED | Mismatched outputs: ['output']
        [E] Accuracy Summary | trt-runner-N0-10/23/23-17:05:18 vs. onnxrt-runner-N0-10/23/23-15:45:37 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 3.556s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[E]     ========== CAPTURED STDERR ==========
[I]     Saving debug replay to polygraphy_debug_replay.json
[E]     FAILED | Iteration 3 | Duration 200.0889549255371s
[I]     RUNNING | Iteration 4 | Approximately 9 iteration(s) remaining
[I]     Selecting first 1252 layer(s) to run in higher precision
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1080, 1081, 1083, 1084, 1090, 1091, 1093, 1094, 1100, 1101, 1107, 1111, 1115, 1119, 1123, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1135, 1136, 1139, 1141, 1142, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1157, 1158, 1160, 1161, 1163, 1164, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1177, 1178, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1193, 1194, 1196, 1197, 1203, 1204, 1206, 1207, 1213, 1214, 1220, 1224, 1228, 1232, 1236, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0523377 seconds.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1642.67ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1490604400]
[V]       Calibrated batch 0 in 0.166838 seconds.
[V]       Post Processing Calibration data in 165.977 seconds.
[V]     Calibration completed in 170.505 seconds.
[V]     Writing Calibration Cache for calibrator: TRT-8601-EntropyCalibration2
[V]     Graph optimization time: 1.50964 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 372656
[V]     Total Device Persistent Memory: 285696
[V]     Total Scratch Memory: 154112
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 485 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 18.2183ms to assign 7 blocks to 485 nodes requiring 1690624 bytes.
[V]     Total Activation Memory: 1689600
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +288, GPU +300, now: CPU 288, GPU 300 (MiB)
[I]     Finished engine building in 197.210 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     ========== CAPTURED STDOUT ==========
        [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
        [I] trt-runner-N0-10/23/23-17:08:40     | Activating and starting inference
        [I] Loading bytes from /workspace/img_clf/notebooks/polygraphy_debug.engine
        [I] trt-runner-N0-10/23/23-17:08:40    
            ---- Inference Input(s) ----
            {input [dtype=float32, shape=(1, 3, 224, 224)]}
        [I] trt-runner-N0-10/23/23-17:08:40    
            ---- Inference Output(s) ----
            {output [dtype=float32, shape=(1, 8)]}
        [I] trt-runner-N0-10/23/23-17:08:40     | Completed 1 iteration(s) in 653.9 ms | Average inference time: 653.9 ms.
        [I] Loading inference results from /workspace/img_clf/data/layerwise_golden.json
        [I] Accuracy Comparison | trt-runner-N0-10/23/23-17:08:40 vs. onnxrt-runner-N0-10/23/23-15:45:37
        [I]     Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))
        [I]         Tolerance: [abs=0, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-10/23/23-17:08:40: output | Stats: mean=-4.832, std-dev=3.0892, var=9.543, median=-5.8418, min=-8.3838 at (0, 1), max=1.2505 at (0, 0), avg-magnitude=5.1446
        [I]             ---- Values ----
                            [[ 1.250505  -8.383767  -1.8844577 -5.0319767 -7.564127  -6.651675
                              -3.329008  -7.061149 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          1 | #############
                        (-7.63 , -6.65 ) |          3 | ########################################
                        (-6.65 , -5.66 ) |          0 | 
                        (-5.66 , -4.67 ) |          1 | #############
                        (-4.67 , -3.68 ) |          0 | 
                        (-3.68 , -2.69 ) |          1 | #############
                        (-2.69 , -1.7  ) |          1 | #############
                        (-1.7  , -0.714) |          0 | 
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | #############
        [I]         onnxrt-runner-N0-10/23/23-15:45:37: output | Stats: mean=-4.5612, std-dev=3.0662, var=9.4016, median=-5.3557, min=-8.6229 at (0, 1), max=1.2628 at (0, 0), avg-magnitude=4.8769
        [I]             ---- Values ----
                            [[ 1.2627729 -8.622935  -1.5289186 -4.499915  -7.0470834 -6.714871
                              -3.1271095 -6.211514 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          1 | ####################
                        (-7.63 , -6.65 ) |          2 | ########################################
                        (-6.65 , -5.66 ) |          1 | ####################
                        (-5.66 , -4.67 ) |          0 | 
                        (-4.67 , -3.68 ) |          1 | ####################
                        (-3.68 , -2.69 ) |          1 | ####################
                        (-2.69 , -1.7  ) |          0 | 
                        (-1.7  , -0.714) |          1 | ####################
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         Error Metrics: output
        [I]             Minimum Required Tolerance: elemwise error | [abs=0.84964] OR [rel=0.23254] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=0.34635, std-dev=0.26008, var=0.067643, median=0.29735, min=0.012268 at (0, 0), max=0.84964 at (0, 7), avg-magnitude=0.34635
        [I]                 ---- Values ----
                                [[0.01226795 0.23916817 0.35553908 0.5320616  0.5170436  0.06319571
                                  0.20189857 0.8496351 ]]
        [I]                 ---- Histogram ----
                            Bin Range       |  Num Elems | Visualization
                            (0.0123, 0.096) |          2 | ########################################
                            (0.096 , 0.18 ) |          0 | 
                            (0.18  , 0.263) |          2 | ########################################
                            (0.263 , 0.347) |          0 | 
                            (0.347 , 0.431) |          1 | ####################
                            (0.431 , 0.515) |          0 | 
                            (0.515 , 0.598) |          2 | ########################################
                            (0.598 , 0.682) |          0 | 
                            (0.682 , 0.766) |          0 | 
                            (0.766 , 0.85 ) |          1 | ####################
        [I]             Relative Difference | Stats: mean=0.084045, std-dev=0.071381, var=0.0050952, median=0.068967, min=0.0094113 at (0, 5), max=0.23254 at (0, 2), avg-magnitude=0.084045
        [I]                 ---- Values ----
                                [[0.00971509 0.02773628 0.23254284 0.11823814 0.07336987 0.00941131
                                  0.06456396 0.1367839 ]]
        [I]                 ---- Histogram ----
                            Bin Range         |  Num Elems | Visualization
                            (0.00941, 0.0317) |          3 | ########################################
                            (0.0317 , 0.054 ) |          0 | 
                            (0.054  , 0.0764) |          2 | ##########################
                            (0.0764 , 0.0987) |          0 | 
                            (0.0987 , 0.121 ) |          1 | #############
                            (0.121  , 0.143 ) |          1 | #############
                            (0.143  , 0.166 ) |          0 | 
                            (0.166  , 0.188 ) |          0 | 
                            (0.188  , 0.21  ) |          0 | 
                            (0.21   , 0.233 ) |          1 | #############
        [E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=0)
        [E]     FAILED | Mismatched outputs: ['output']
        [E] Accuracy Summary | trt-runner-N0-10/23/23-17:08:40 vs. onnxrt-runner-N0-10/23/23-15:45:37 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 3.592s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[E]     ========== CAPTURED STDERR ==========
[I]     Saving debug replay to polygraphy_debug_replay.json
[E]     FAILED | Iteration 4 | Duration 201.41086626052856s
[I]     RUNNING | Iteration 5 | Approximately 8 iteration(s) remaining
[I]     Selecting first 1341 layer(s) to run in higher precision
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1080, 1081, 1083, 1084, 1090, 1091, 1093, 1094, 1100, 1101, 1107, 1111, 1115, 1119, 1123, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1135, 1136, 1139, 1141, 1142, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1157, 1158, 1160, 1161, 1163, 1164, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1177, 1178, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1193, 1194, 1196, 1197, 1203, 1204, 1206, 1207, 1213, 1214, 1220, 1224, 1228, 1232, 1236, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249, 1252, 1254, 1255, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1270, 1271, 1273, 1274, 1276, 1277, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1290, 1291, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1306, 1307, 1309, 1310, 1316, 1317, 1319, 1320, 1326, 1327, 1333, 1337} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0486154 seconds.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1629.78ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1490604400]
[V]       Calibrated batch 0 in 0.166633 seconds.
[V]       Post Processing Calibration data in 163.08 seconds.
[V]     Calibration completed in 167.577 seconds.
[V]     Writing Calibration Cache for calibrator: TRT-8601-EntropyCalibration2
[V]     Graph optimization time: 1.498 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 369776
[V]     Total Device Persistent Memory: 313344
[V]     Total Scratch Memory: 154112
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 473 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 17.0699ms to assign 7 blocks to 473 nodes requiring 1690624 bytes.
[V]     Total Activation Memory: 1689600
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +308, GPU +320, now: CPU 308, GPU 320 (MiB)
[I]     Finished engine building in 190.840 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     ========== CAPTURED STDOUT ==========
        [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
        [I] trt-runner-N0-10/23/23-17:11:55     | Activating and starting inference
        [I] Loading bytes from /workspace/img_clf/notebooks/polygraphy_debug.engine
        [I] trt-runner-N0-10/23/23-17:11:55    
            ---- Inference Input(s) ----
            {input [dtype=float32, shape=(1, 3, 224, 224)]}
        [I] trt-runner-N0-10/23/23-17:11:55    
            ---- Inference Output(s) ----
            {output [dtype=float32, shape=(1, 8)]}
        [I] trt-runner-N0-10/23/23-17:11:55     | Completed 1 iteration(s) in 659 ms | Average inference time: 659 ms.
        [I] Loading inference results from /workspace/img_clf/data/layerwise_golden.json
        [I] Accuracy Comparison | trt-runner-N0-10/23/23-17:11:55 vs. onnxrt-runner-N0-10/23/23-15:45:37
        [I]     Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))
        [I]         Tolerance: [abs=0, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-10/23/23-17:11:55: output | Stats: mean=-4.7106, std-dev=3.34, var=11.156, median=-5.5748, min=-9.1529 at (0, 1), max=1.928 at (0, 0), avg-magnitude=5.1926
        [I]             ---- Values ----
                            [[ 1.9280488 -9.152926  -1.7256244 -4.526225  -7.2750354 -6.8673315
                              -3.4420202 -6.623417 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-9.15 , -8.04 ) |          1 | ####################
                        (-8.04 , -6.94 ) |          1 | ####################
                        (-6.94 , -5.83 ) |          2 | ########################################
                        (-5.83 , -4.72 ) |          0 | 
                        (-4.72 , -3.61 ) |          1 | ####################
                        (-3.61 , -2.5  ) |          1 | ####################
                        (-2.5  , -1.4  ) |          1 | ####################
                        (-1.4  , -0.288) |          0 | 
                        (-0.288, 0.82  ) |          0 | 
                        (0.82  , 1.93  ) |          1 | ####################
        [I]         onnxrt-runner-N0-10/23/23-15:45:37: output | Stats: mean=-4.5612, std-dev=3.0662, var=9.4016, median=-5.3557, min=-8.6229 at (0, 1), max=1.2628 at (0, 0), avg-magnitude=4.8769
        [I]             ---- Values ----
                            [[ 1.2627729 -8.622935  -1.5289186 -4.499915  -7.0470834 -6.714871
                              -3.1271095 -6.211514 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-9.15 , -8.04 ) |          1 | ####################
                        (-8.04 , -6.94 ) |          1 | ####################
                        (-6.94 , -5.83 ) |          2 | ########################################
                        (-5.83 , -4.72 ) |          0 | 
                        (-4.72 , -3.61 ) |          1 | ####################
                        (-3.61 , -2.5  ) |          1 | ####################
                        (-2.5  , -1.4  ) |          1 | ####################
                        (-1.4  , -0.288) |          0 | 
                        (-0.288, 0.82  ) |          0 | 
                        (0.82  , 1.93  ) |          1 | ####################
        [I]         Error Metrics: output
        [I]             Minimum Required Tolerance: elemwise error | [abs=0.66528] OR [rel=0.52684] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=0.31569, std-dev=0.19673, var=0.038704, median=0.27143, min=0.02631 at (0, 3), max=0.66528 at (0, 0), avg-magnitude=0.31569
        [I]                 ---- Values ----
                                [[0.66527593 0.52999115 0.19670582 0.02630997 0.227952   0.15246058
                                  0.31491065 0.4119029 ]]
        [I]                 ---- Histogram ----
                            Bin Range        |  Num Elems | Visualization
                            (0.0263, 0.0902) |          1 | ########################################
                            (0.0902, 0.154 ) |          1 | ########################################
                            (0.154 , 0.218 ) |          1 | ########################################
                            (0.218 , 0.282 ) |          1 | ########################################
                            (0.282 , 0.346 ) |          1 | ########################################
                            (0.346 , 0.41  ) |          0 | 
                            (0.41  , 0.474 ) |          1 | ########################################
                            (0.474 , 0.537 ) |          1 | ########################################
                            (0.537 , 0.601 ) |          0 | 
                            (0.601 , 0.665 ) |          1 | ########################################
        [I]             Relative Difference | Stats: mean=0.11811, std-dev=0.15907, var=0.025303, median=0.063888, min=0.0058468 at (0, 3), max=0.52684 at (0, 0), avg-magnitude=0.11811
        [I]                 ---- Values ----
                                [[0.52683735 0.06146296 0.12865682 0.00584677 0.032347   0.02270492
                                  0.10070343 0.0663128 ]]
        [I]                 ---- Histogram ----
                            Bin Range         |  Num Elems | Visualization
                            (0.00585, 0.0579) |          3 | ########################################
                            (0.0579 , 0.11  ) |          3 | ########################################
                            (0.11   , 0.162 ) |          1 | #############
                            (0.162  , 0.214 ) |          0 | 
                            (0.214  , 0.266 ) |          0 | 
                            (0.266  , 0.318 ) |          0 | 
                            (0.318  , 0.371 ) |          0 | 
                            (0.371  , 0.423 ) |          0 | 
                            (0.423  , 0.475 ) |          0 | 
                            (0.475  , 0.527 ) |          1 | #############
        [E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=0)
        [E]     FAILED | Mismatched outputs: ['output']
        [E] Accuracy Summary | trt-runner-N0-10/23/23-17:11:55 vs. onnxrt-runner-N0-10/23/23-15:45:37 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 3.615s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[E]     ========== CAPTURED STDERR ==========
[I]     Saving debug replay to polygraphy_debug_replay.json
[E]     FAILED | Iteration 5 | Duration 195.09278988838196s
[I]     RUNNING | Iteration 6 | Approximately 7 iteration(s) remaining
[I]     Selecting first 1386 layer(s) to run in higher precision
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1080, 1081, 1083, 1084, 1090, 1091, 1093, 1094, 1100, 1101, 1107, 1111, 1115, 1119, 1123, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1135, 1136, 1139, 1141, 1142, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1157, 1158, 1160, 1161, 1163, 1164, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1177, 1178, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1193, 1194, 1196, 1197, 1203, 1204, 1206, 1207, 1213, 1214, 1220, 1224, 1228, 1232, 1236, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249, 1252, 1254, 1255, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1270, 1271, 1273, 1274, 1276, 1277, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1290, 1291, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1306, 1307, 1309, 1310, 1316, 1317, 1319, 1320, 1326, 1327, 1333, 1337, 1341, 1345, 1349, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1361, 1362, 1365, 1367, 1368, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1383, 1384} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0533773 seconds.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1657.49ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1490604400]
[V]       Calibrated batch 0 in 0.167344 seconds.

zerollzeng · 2023-10-30T13:40:33Z

@pranavm-nvidia I haven't sue polygraphy debut yet, could you please kindly help here :-)

jinhonglu · 2025-01-27T15:39:23Z

after the debug process, how should we use the reply.json file to build our mixed-precision engine?

ttyio added Topic: Myelin triaged Issue has been triaged by maintainers labels Jan 24, 2022

Slyne closed this as completed Jan 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polygraphy [HostToDeviceCopy]requires bool I/O but node can not be handled by Myelin. #1698

Polygraphy [HostToDeviceCopy]requires bool I/O but node can not be handled by Myelin. #1698

Slyne commented Dec 30, 2021 •

edited

Loading

zerollzeng commented Dec 31, 2021

Slyne commented Jan 4, 2022

zerollzeng commented Jan 4, 2022

Slyne commented Jan 5, 2022

zerollzeng commented Jan 5, 2022 •

edited

Loading

Slyne commented Jan 25, 2022

hhhhhanxu commented Sep 21, 2022

safehumeng commented Sep 27, 2022

hhhhhanxu commented Sep 27, 2022 via email

proevgenii commented Oct 23, 2023 •

edited

Loading

zerollzeng commented Oct 30, 2023

jinhonglu commented Jan 27, 2025

Polygraphy [HostToDeviceCopy]requires bool I/O but node can not be handled by Myelin. #1698

Polygraphy [HostToDeviceCopy]requires bool I/O but node can not be handled by Myelin. #1698

Comments

Slyne commented Dec 30, 2021 • edited Loading

Description

Environment

Relevant Files

Steps To Reproduce

zerollzeng commented Dec 31, 2021

Slyne commented Jan 4, 2022

zerollzeng commented Jan 4, 2022

Slyne commented Jan 5, 2022

zerollzeng commented Jan 5, 2022 • edited Loading

Slyne commented Jan 25, 2022

hhhhhanxu commented Sep 21, 2022

safehumeng commented Sep 27, 2022

hhhhhanxu commented Sep 27, 2022 via email

proevgenii commented Oct 23, 2023 • edited Loading

zerollzeng commented Oct 30, 2023

jinhonglu commented Jan 27, 2025

Slyne commented Dec 30, 2021 •

edited

Loading

zerollzeng commented Jan 5, 2022 •

edited

Loading

proevgenii commented Oct 23, 2023 •

edited

Loading