Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polygraphy [HostToDeviceCopy]requires bool I/O but node can not be handled by Myelin. #1698

Closed
Slyne opened this issue Dec 30, 2021 · 12 comments
Labels
triaged Issue has been triaged by maintainers

Comments

@Slyne
Copy link

Slyne commented Dec 30, 2021

Description

polygraphy run encoder.onnx --trt --onnxrt  --trt-outputs mark all     --onnx-outputs mark all --fail-fast

Environment

TensorRT Version: 8.2.1
NVIDIA GPU: V100
NVIDIA Driver Version:
CUDA Version: 11.4
CUDNN Version:
Operating System: ubuntu
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 1.10
Baremetal or Container (if so, version):

Relevant Files

Model: https://nvidia-my.sharepoint.com/:u:/p/slyned/ESC-DizzoztLhNsFrQAW_F4BRwGMj7YZAYwteGhvOxxi3A?e=A2Gsb1
Please use nvidia account to get access to it.

Steps To Reproduce

polygraphy run encoder.onnx --trt --onnxrt  --trt-outputs mark all     --onnx-outputs mark all --fail-fast
@zerollzeng
Copy link
Collaborator

Can you run the model with trtexec and attach the log? a log with --verbose is preferred.

@Slyne
Copy link
Author

Slyne commented Jan 4, 2022

@zerollzeng
log.txt

@zerollzeng
Copy link
Collaborator

Are you getting errors when running with polygraphy? your trtexec log seems running successfully. try running with the same config as you did with trtexec and don't mark any outputs.

@Slyne
Copy link
Author

Slyne commented Jan 5, 2022

Are you getting errors when running with polygraphy? your trtexec log seems running successfully. try running with the same config as you did with trtexec and don't mark any outputs.

Yes, I only get errors when running with polygraphy. Running it without mark is ok. But I need to find the tensors that are not right compared with onnx model.

@zerollzeng
Copy link
Collaborator

zerollzeng commented Jan 5, 2022

Marking all tensors as output will break layer fusion hence may affect final output, I recommend making output with bisect until you find the problematic layer.

@ttyio ttyio added Topic: Myelin triaged Issue has been triaged by maintainers labels Jan 24, 2022
@Slyne
Copy link
Author

Slyne commented Jan 25, 2022

Marking all tensors as output will break layer fusion hence may affect final output, I recommend making output with bisect until you find the problematic layer.

Thank you! I finally find the issue layers. Will file another bug for the accuraccy problem.

@Slyne Slyne closed this as completed Jan 25, 2022
@hhhhhanxu
Copy link

Marking all tensors as output will break layer fusion hence may affect final output, I recommend making output with bisect until you find the problematic layer.

Hello,I want to know how to make output with bisect ? Can I directly use polygraphy or use onnx-graphsurgeon to modify the .onnx ?

@safehumeng
Copy link

Marking all tensors as output will break layer fusion hence may affect final output, I recommend making output with bisect until you find the problematic layer.

Hello,I want to know how to make output with bisect ? Can I directly use polygraphy or use onnx-graphsurgeon to modify the .onnx ?

I got the way from google
polygraphy debug precision net_bs8.onnx --fp16 --tactic-sources cublas --check polygraphy run polygraphy_debug.engine --trt --load-outputs onnx_res.json --abs 1e-1

@hhhhhanxu
Copy link

hhhhhanxu commented Sep 27, 2022 via email

@proevgenii
Copy link

proevgenii commented Oct 23, 2023

@zerollzeng, How can I get difference for all layers?
When I try to compare layer-by-layer with command

polygraphy debug precision\
 --mode bisect\
/workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx\
--fp16\
--verbose\
-p float16\
--check polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json\
--abs 0

It detects a big difference in the output layer, which is logical and understandable, but why doesn't it go on to check the other layers?

Log
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] RUNNING | Command: /usr/local/bin/polygraphy debug precision --mode bisect /workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx --fp16 --verbose -p float16 --check polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[V] Loaded Module: polygraphy | Version: 0.47.1 | Path: ['/usr/local/lib/python3.10/dist-packages/polygraphy']
[V] Loaded Module: tensorrt | Version: 8.6.1 | Path: ['/usr/local/lib/python3.10/dist-packages/tensorrt']
[V] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 24, GPU 1926 (MiB)
[V] [MemUsageChange] Init builder kernel library: CPU +888, GPU +174, now: CPU 989, GPU 2100 (MiB)
[V] ----------------------------------------------------------------
[V] Input filename:   /workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx
[V] ONNX IR version:  0.0.8
[V] Opset version:    16
[V] Producer name:    pytorch
[V] Producer version: 2.1.0
[V] Domain:           
[V] Model version:    0
[V] Doc string:       
[V] ----------------------------------------------------------------
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] Using float16 as the higher precision, but float16 is also the lowest precision available. Did you mean to set --int8 as well?
[I] Using DataType.HALF as higher precision
[I]     RUNNING | Iteration 1 | Approximately 11 iteration(s) remaining
[I]     Selecting first 1430 layer(s) to run in higher precision
[V]     Loaded Module: polygraphy.backend.trt.util
[V]     Loaded Module: numpy | Version: 1.22.2 | Path: ['/usr/local/lib/python3.10/dist-packages/numpy']
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1080, 1081, 1083, 1084, 1090, 1091, 1093, 1094, 1100, 1101, 1107, 1111, 1115, 1119, 1123, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1135, 1136, 1139, 1141, 1142, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1157, 1158, 1160, 1161, 1163, 1164, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1177, 1178, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1193, 1194, 1196, 1197, 1203, 1204, 1206, 1207, 1213, 1214, 1220, 1224, 1228, 1232, 1236, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249, 1252, 1254, 1255, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1270, 1271, 1273, 1274, 1276, 1277, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1290, 1291, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1306, 1307, 1309, 1310, 1316, 1317, 1319, 1320, 1326, 1327, 1333, 1337, 1341, 1345, 1349, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1361, 1362, 1365, 1367, 1368, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1383, 1384, 1386, 1387, 1389, 1390, 1392, 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1403, 1404, 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1419, 1420, 1422, 1423, 1424, 1426, 1428, 1429} to run in DataType.HALF precision
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[W]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | No shapes provided; Will use shape: [1, 3, 224, 224] for min/opt/max in profile.
[W]         This will cause the tensor to have a static shape. If this is incorrect, please set the range of shapes for this input tensor.
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [FP16, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
[V]     Graph optimization time: 0.118371 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 5328
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 1478656
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 166 MiB, GPU 166 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 3 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 0.036472ms to assign 2 blocks to 3 nodes requiring 1553920 bytes.
[V]     Total Activation Memory: 1553920
[W]     TensorRT encountered issues when converting weights between types and that could affect accuracy.
[W]     If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[W]     Check verbose logs for the list of affected weights.
[W]     - 85 weights are affected by this issue: Detected subnormal FP16 values.
[W]     - 49 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +4, GPU +175, now: CPU 4, GPU 175 (MiB)
[I]     Finished engine building in 51.889 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     ========== CAPTURED STDOUT ==========
        [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
        [I] trt-runner-N0-10/23/23-16:32:19     | Activating and starting inference
        [I] Loading bytes from /workspace/img_clf/notebooks/polygraphy_debug.engine
        [I] trt-runner-N0-10/23/23-16:32:19    
            ---- Inference Input(s) ----
            {input [dtype=float32, shape=(1, 3, 224, 224)]}
        [I] trt-runner-N0-10/23/23-16:32:19    
            ---- Inference Output(s) ----
            {output [dtype=float32, shape=(1, 8)]}
        [I] trt-runner-N0-10/23/23-16:32:19     | Completed 1 iteration(s) in 5.605 ms | Average inference time: 5.605 ms.
        [I] Loading inference results from /workspace/img_clf/data/layerwise_golden.json
        [I] Accuracy Comparison | trt-runner-N0-10/23/23-16:32:19 vs. onnxrt-runner-N0-10/23/23-15:45:37
        [I]     Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))
        [I]         Tolerance: [abs=0, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-10/23/23-16:32:19: output | Stats: mean=-0.98755, std-dev=0.94736, var=0.89749, median=-0.8186, min=-2.4531 at (0, 4), max=0.92334 at (0, 2), avg-magnitude=1.2184
        [I]             ---- Values ----
                            [[-0.6977539  -0.72753906  0.92333984 -1.3388672  -2.453125   -1.9960938
                              -0.7006836  -0.90966797]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          0 | 
                        (-7.63 , -6.65 ) |          0 | 
                        (-6.65 , -5.66 ) |          0 | 
                        (-5.66 , -4.67 ) |          0 | 
                        (-4.67 , -3.68 ) |          0 | 
                        (-3.68 , -2.69 ) |          0 | 
                        (-2.69 , -1.7  ) |          2 | ##########################
                        (-1.7  , -0.714) |          3 | ########################################
                        (-0.714, 0.274 ) |          2 | ##########################
                        (0.274 , 1.26  ) |          1 | #############
        [I]         onnxrt-runner-N0-10/23/23-15:45:37: output | Stats: mean=-4.5612, std-dev=3.0662, var=9.4016, median=-5.3557, min=-8.6229 at (0, 1), max=1.2628 at (0, 0), avg-magnitude=4.8769
        [I]             ---- Values ----
                            [[ 1.2627729 -8.622935  -1.5289186 -4.499915  -7.0470834 -6.714871
                              -3.1271095 -6.211514 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          1 | ####################
                        (-7.63 , -6.65 ) |          2 | ########################################
                        (-6.65 , -5.66 ) |          1 | ####################
                        (-5.66 , -4.67 ) |          0 | 
                        (-4.67 , -3.68 ) |          1 | ####################
                        (-3.68 , -2.69 ) |          1 | ####################
                        (-2.69 , -1.7  ) |          0 | 
                        (-1.7  , -0.714) |          1 | ####################
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         Error Metrics: output
        [I]             Minimum Required Tolerance: elemwise error | [abs=7.8954] OR [rel=1.6039] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=4.0638, std-dev=1.8521, var=3.4301, median=3.8775, min=1.9605 at (0, 0), max=7.8954 at (0, 1), avg-magnitude=4.0638
        [I]                 ---- Values ----
                                [[1.9605268 7.895396  2.4522586 3.161048  4.5939584 4.718777  2.426426
                                  5.301846 ]]
        [I]                 ---- Histogram ----
                            Bin Range    |  Num Elems | Visualization
                            (1.96, 2.55) |          3 | ########################################
                            (2.55, 3.15) |          0 | 
                            (3.15, 3.74) |          1 | #############
                            (3.74, 4.33) |          0 | 
                            (4.33, 4.93) |          2 | ##########################
                            (4.93, 5.52) |          1 | #############
                            (5.52, 6.11) |          0 | 
                            (6.11, 6.71) |          0 | 
                            (6.71, 7.3 ) |          0 | 
                            (7.3 , 7.9 ) |          1 | #############
        [I]             Relative Difference | Stats: mean=0.96984, std-dev=0.36049, var=0.12995, median=0.81474, min=0.65189 at (0, 4), max=1.6039 at (0, 2), avg-magnitude=0.96984
        [I]                 ---- Values ----
                                [[1.5525569  0.9156274  1.603917   0.70246834 0.651895   0.70273536
                                  0.7759325  0.8535513 ]]
        [I]                 ---- Histogram ----
                            Bin Range      |  Num Elems | Visualization
                            (0.652, 0.747) |          3 | ########################################
                            (0.747, 0.842) |          1 | #############
                            (0.842, 0.938) |          2 | ##########################
                            (0.938, 1.03 ) |          0 | 
                            (1.03 , 1.13 ) |          0 | 
                            (1.13 , 1.22 ) |          0 | 
                            (1.22 , 1.32 ) |          0 | 
                            (1.32 , 1.41 ) |          0 | 
                            (1.41 , 1.51 ) |          0 | 
                            (1.51 , 1.6  ) |          2 | ##########################
        [E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=0)
        [E]     FAILED | Mismatched outputs: ['output']
        [E] Accuracy Summary | trt-runner-N0-10/23/23-16:32:19 vs. onnxrt-runner-N0-10/23/23-15:45:37 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 3.012s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[E]     ========== CAPTURED STDERR ==========
[I]     Saving debug replay to polygraphy_debug_replay.json
[E]     FAILED | Iteration 1 | Duration 55.53689908981323s
[E]     Could not find a configuration that satisfied accuracy requirements.
[I] Finished 1 iteration(s) | Passed: 0/1 | Pass Rate: 0.0%
[I] PASSED | Runtime: 66.191s | Command: /usr/local/bin/polygraphy debug precision --mode bisect /workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx --fp16 --verbose -p float16 --check polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0

When I set precision to int8, I also get diff only for 'output' layer, and I don't why it tell that precision of both float32?
[I] Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))

Command:

!polygraphy debug precision\
 --mode bisect\
/workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx\
--int8\
--verbose\
-p float32\
--check polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json\
--abs 0
Log
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] RUNNING | Command: /usr/local/bin/polygraphy debug precision --mode bisect /workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx --int8 --verbose -p float32 --check polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[V] Loaded Module: polygraphy | Version: 0.47.1 | Path: ['/usr/local/lib/python3.10/dist-packages/polygraphy']
[V] Loaded Module: tensorrt | Version: 8.6.1 | Path: ['/usr/local/lib/python3.10/dist-packages/tensorrt']
[V] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 24, GPU 1926 (MiB)
[V] [MemUsageChange] Init builder kernel library: CPU +888, GPU +174, now: CPU 989, GPU 2100 (MiB)
[V] ----------------------------------------------------------------
[V] Input filename:   /workspace/tensorrt/models/all_data_bags_v1/vit_base_patch32_224_clip_laion2b_ft_dyn.onnx
[V] ONNX IR version:  0.0.8
[V] Opset version:    16
[V] Producer name:    pytorch
[V] Producer version: 2.1.0
[V] Domain:           
[V] Model version:    0
[V] Doc string:       
[V] ----------------------------------------------------------------
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I] Using DataType.FLOAT as higher precision
[I]     RUNNING | Iteration 1 | Approximately 11 iteration(s) remaining
[I]     Selecting first 1430 layer(s) to run in higher precision
[V]     Loaded Module: polygraphy.backend.trt.util
[V]     Loaded Module: numpy | Version: 1.22.2 | Path: ['/usr/local/lib/python3.10/dist-packages/numpy']
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1080, 1081, 1083, 1084, 1090, 1091, 1093, 1094, 1100, 1101, 1107, 1111, 1115, 1119, 1123, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1135, 1136, 1139, 1141, 1142, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1157, 1158, 1160, 1161, 1163, 1164, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1177, 1178, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1193, 1194, 1196, 1197, 1203, 1204, 1206, 1207, 1213, 1214, 1220, 1224, 1228, 1232, 1236, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249, 1252, 1254, 1255, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1270, 1271, 1273, 1274, 1276, 1277, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1290, 1291, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1306, 1307, 1309, 1310, 1316, 1317, 1319, 1320, 1326, 1327, 1333, 1337, 1341, 1345, 1349, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1361, 1362, 1365, 1367, 1368, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1383, 1384, 1386, 1387, 1389, 1390, 1392, 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1403, 1404, 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1419, 1420, 1422, 1423, 1424, 1426, 1428, 1429} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[W]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | No shapes provided; Will use shape: [1, 3, 224, 224] for min/opt/max in profile.
[W]         This will cause the tensor to have a static shape. If this is incorrect, please set the range of shapes for this input tensor.
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0457923 seconds.
[W]     BuilderFlag::kENABLE_TACTIC_HEURISTIC has been ignored in this builder run. This feature is only supported on Ampere and beyond.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 343 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1574.7ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Input tensor: input | Generating input data in range: [0.0, 1.0]
[V]     Found candidate CUDA libraries: ['/usr/local/cuda/lib64/libcudart.so', '/usr/local/cuda/lib64/libcudart.so.12', '/usr/local/cuda/lib64/libcudart.so.12.2.128']
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1496604400]
[V]       Calibrated batch 0 in 0.162968 seconds.
[V]       Post Processing Calibration data in 150.214 seconds.
[V]     Calibration completed in 155.589 seconds.
[V]     Writing Calibration Cache for calibrator: TRT-8601-EntropyCalibration2
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 8) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 29) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 33) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 109) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.0/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 148) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 152) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor (Unnamed Layer* 156) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.1/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.2/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.3/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.4/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.5/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.6/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.7/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.8/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.9/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.10/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W]     Missing scale and zero-point for tensor /blocks/blocks.11/attn/Softmax_output_0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[V]     Graph optimization time: 1.36354 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 368320
[V]     Total Device Persistent Memory: 328704
[V]     Total Scratch Memory: 154112
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 333 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 480 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 15.7391ms to assign 7 blocks to 480 nodes requiring 1690624 bytes.
[V]     Total Activation Memory: 1689600
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +324, GPU +335, now: CPU 324, GPU 335 (MiB)
[I]     Finished engine building in 196.464 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     Saving debug replay to polygraphy_debug_replay.json
[I]     PASSED | Iteration 1 | Duration 200.7176752090454s
[I]     RUNNING | Iteration 2 | Approximately 11 iteration(s) remaining
[I]     Selecting first 715 layer(s) to run in higher precision
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0477532 seconds.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 333 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1629.1ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1490604400]
[V]       Calibrated batch 0 in 0.164016 seconds.
[V]       Post Processing Calibration data in 156.685 seconds.
[V]     Calibration completed in 161.148 seconds.
[V]     Writing Calibration Cache for calibrator: TRT-8601-EntropyCalibration2
[V]     Graph optimization time: 1.53557 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 378800
[V]     Total Device Persistent Memory: 184320
[V]     Total Scratch Memory: 154112
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 511 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 24.498ms to assign 7 blocks to 511 nodes requiring 1690624 bytes.
[V]     Total Activation Memory: 1689600
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +213, GPU +224, now: CPU 213, GPU 224 (MiB)
[I]     Finished engine building in 288.108 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     ========== CAPTURED STDOUT ==========
        [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
        [I] trt-runner-N0-10/23/23-17:01:58     | Activating and starting inference
        [I] Loading bytes from /workspace/img_clf/notebooks/polygraphy_debug.engine
        [I] trt-runner-N0-10/23/23-17:01:58    
            ---- Inference Input(s) ----
            {input [dtype=float32, shape=(1, 3, 224, 224)]}
        [I] trt-runner-N0-10/23/23-17:01:58    
            ---- Inference Output(s) ----
            {output [dtype=float32, shape=(1, 8)]}
        [I] trt-runner-N0-10/23/23-17:01:58     | Completed 1 iteration(s) in 665.5 ms | Average inference time: 665.5 ms.
        [I] Loading inference results from /workspace/img_clf/data/layerwise_golden.json
        [I] Accuracy Comparison | trt-runner-N0-10/23/23-17:01:58 vs. onnxrt-runner-N0-10/23/23-15:45:37
        [I]     Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))
        [I]         Tolerance: [abs=0, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-10/23/23-17:01:58: output | Stats: mean=-3.7136, std-dev=2.2259, var=4.9544, median=-4.0574, min=-6.6405 at (0, 1), max=1.0144 at (0, 0), avg-magnitude=3.9673
        [I]             ---- Values ----
                            [[ 1.014436  -6.640457  -3.1188476 -4.164866  -5.4636207 -3.9500134
                              -2.105751  -5.2800465]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          0 | 
                        (-7.63 , -6.65 ) |          0 | 
                        (-6.65 , -5.66 ) |          1 | ####################
                        (-5.66 , -4.67 ) |          2 | ########################################
                        (-4.67 , -3.68 ) |          2 | ########################################
                        (-3.68 , -2.69 ) |          1 | ####################
                        (-2.69 , -1.7  ) |          1 | ####################
                        (-1.7  , -0.714) |          0 | 
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         onnxrt-runner-N0-10/23/23-15:45:37: output | Stats: mean=-4.5612, std-dev=3.0662, var=9.4016, median=-5.3557, min=-8.6229 at (0, 1), max=1.2628 at (0, 0), avg-magnitude=4.8769
        [I]             ---- Values ----
                            [[ 1.2627729 -8.622935  -1.5289186 -4.499915  -7.0470834 -6.714871
                              -3.1271095 -6.211514 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          1 | ####################
                        (-7.63 , -6.65 ) |          2 | ########################################
                        (-6.65 , -5.66 ) |          1 | ####################
                        (-5.66 , -4.67 ) |          0 | 
                        (-4.67 , -3.68 ) |          1 | ####################
                        (-3.68 , -2.69 ) |          1 | ####################
                        (-2.69 , -1.7  ) |          0 | 
                        (-1.7  , -0.714) |          1 | ####################
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         Error Metrics: output
        [I]             Minimum Required Tolerance: elemwise error | [abs=2.7649] OR [rel=1.0399] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=1.3071, std-dev=0.79264, var=0.62827, median=1.3024, min=0.24834 at (0, 0), max=2.7649 at (0, 5), avg-magnitude=1.3071
        [I]                 ---- Values ----
                                [[0.24833691 1.9824781  1.589929   0.33504915 1.5834627  2.7648575
                                  1.0213585  0.93146753]]
        [I]                 ---- Histogram ----
                            Bin Range      |  Num Elems | Visualization
                            (0.248, 0.5  ) |          2 | ########################################
                            (0.5  , 0.752) |          0 | 
                            (0.752, 1    ) |          1 | ####################
                            (1    , 1.25 ) |          1 | ####################
                            (1.25 , 1.51 ) |          0 | 
                            (1.51 , 1.76 ) |          2 | ########################################
                            (1.76 , 2.01 ) |          1 | ####################
                            (2.01 , 2.26 ) |          0 | 
                            (2.26 , 2.51 ) |          0 | 
                            (2.51 , 2.76 ) |          1 | ####################
        [I]             Relative Difference | Stats: mean=0.33174, std-dev=0.28444, var=0.080905, median=0.2273, min=0.074457 at (0, 3), max=1.0399 at (0, 2), avg-magnitude=0.33174
        [I]                 ---- Values ----
                                [[0.19666    0.22990757 1.0399042  0.07445677 0.2246976  0.4117514
                                  0.32661423 0.14995821]]
        [I]                 ---- Histogram ----
                            Bin Range       |  Num Elems | Visualization
                            (0.0745, 0.171) |          2 | ##########################
                            (0.171 , 0.268) |          3 | ########################################
                            (0.268 , 0.364) |          1 | #############
                            (0.364 , 0.461) |          1 | #############
                            (0.461 , 0.557) |          0 | 
                            (0.557 , 0.654) |          0 | 
                            (0.654 , 0.75 ) |          0 | 
                            (0.75  , 0.847) |          0 | 
                            (0.847 , 0.943) |          0 | 
                            (0.943 , 1.04 ) |          1 | #############
        [E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=0)
        [E]     FAILED | Mismatched outputs: ['output']
        [E] Accuracy Summary | trt-runner-N0-10/23/23-17:01:58 vs. onnxrt-runner-N0-10/23/23-15:45:37 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 3.511s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[E]     ========== CAPTURED STDERR ==========
[I]     Saving debug replay to polygraphy_debug_replay.json
[E]     FAILED | Iteration 2 | Duration 292.11703515052795s
[I]     RUNNING | Iteration 3 | Approximately 10 iteration(s) remaining
[I]     Selecting first 1073 layer(s) to run in higher precision
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0484175 seconds.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1640.19ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1490604400]
[V]       Calibrated batch 0 in 0.166882 seconds.
[V]       Post Processing Calibration data in 163.59 seconds.
[V]     Calibration completed in 168.106 seconds.
[V]     Writing Calibration Cache for calibrator: TRT-8601-EntropyCalibration2
[V]     Graph optimization time: 1.48535 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 371264
[V]     Total Device Persistent Memory: 176640
[V]     Total Scratch Memory: 154112
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 511 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 19.0428ms to assign 7 blocks to 511 nodes requiring 1690624 bytes.
[V]     Total Activation Memory: 1689600
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +268, GPU +279, now: CPU 268, GPU 279 (MiB)
[I]     Finished engine building in 195.951 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     ========== CAPTURED STDOUT ==========
        [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
        [I] trt-runner-N0-10/23/23-17:05:18     | Activating and starting inference
        [I] Loading bytes from /workspace/img_clf/notebooks/polygraphy_debug.engine
        [I] trt-runner-N0-10/23/23-17:05:18    
            ---- Inference Input(s) ----
            {input [dtype=float32, shape=(1, 3, 224, 224)]}
        [I] trt-runner-N0-10/23/23-17:05:18    
            ---- Inference Output(s) ----
            {output [dtype=float32, shape=(1, 8)]}
        [I] trt-runner-N0-10/23/23-17:05:18     | Completed 1 iteration(s) in 618.5 ms | Average inference time: 618.5 ms.
        [I] Loading inference results from /workspace/img_clf/data/layerwise_golden.json
        [I] Accuracy Comparison | trt-runner-N0-10/23/23-17:05:18 vs. onnxrt-runner-N0-10/23/23-15:45:37
        [I]     Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))
        [I]         Tolerance: [abs=0, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-10/23/23-17:05:18: output | Stats: mean=-4.0761, std-dev=2.6041, var=6.7812, median=-5.0699, min=-7.2015 at (0, 1), max=0.57395 at (0, 0), avg-magnitude=4.2196
        [I]             ---- Values ----
                            [[ 0.5739483 -7.2014675 -1.2806717 -4.6683197 -6.3941054 -6.0034494
                              -2.1633415 -5.4715405]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          0 | 
                        (-7.63 , -6.65 ) |          1 | ####################
                        (-6.65 , -5.66 ) |          2 | ########################################
                        (-5.66 , -4.67 ) |          1 | ####################
                        (-4.67 , -3.68 ) |          1 | ####################
                        (-3.68 , -2.69 ) |          0 | 
                        (-2.69 , -1.7  ) |          1 | ####################
                        (-1.7  , -0.714) |          1 | ####################
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         onnxrt-runner-N0-10/23/23-15:45:37: output | Stats: mean=-4.5612, std-dev=3.0662, var=9.4016, median=-5.3557, min=-8.6229 at (0, 1), max=1.2628 at (0, 0), avg-magnitude=4.8769
        [I]             ---- Values ----
                            [[ 1.2627729 -8.622935  -1.5289186 -4.499915  -7.0470834 -6.714871
                              -3.1271095 -6.211514 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          1 | ####################
                        (-7.63 , -6.65 ) |          2 | ########################################
                        (-6.65 , -5.66 ) |          1 | ####################
                        (-5.66 , -4.67 ) |          0 | 
                        (-4.67 , -3.68 ) |          1 | ####################
                        (-3.68 , -2.69 ) |          1 | ####################
                        (-2.69 , -1.7  ) |          0 | 
                        (-1.7  , -0.714) |          1 | ####################
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         Error Metrics: output
        [I]             Minimum Required Tolerance: elemwise error | [abs=1.4215] OR [rel=0.54549] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=0.69939, std-dev=0.36756, var=0.1351, median=0.70012, min=0.1684 at (0, 3), max=1.4215 at (0, 1), avg-magnitude=0.69939
        [I]                 ---- Values ----
                                [[0.6888246  1.4214678  0.24824691 0.16840458 0.65297794 0.7114215
                                  0.963768   0.73997355]]
        [I]                 ---- Histogram ----
                            Bin Range      |  Num Elems | Visualization
                            (0.168, 0.294) |          2 | ##########################
                            (0.294, 0.419) |          0 | 
                            (0.419, 0.544) |          0 | 
                            (0.544, 0.67 ) |          1 | #############
                            (0.67 , 0.795) |          3 | ########################################
                            (0.795, 0.92 ) |          0 | 
                            (0.92 , 1.05 ) |          1 | #############
                            (1.05 , 1.17 ) |          0 | 
                            (1.17 , 1.3  ) |          0 | 
                            (1.3  , 1.42 ) |          1 | #############
        [I]             Relative Difference | Stats: mean=0.19201, std-dev=0.1527, var=0.023318, median=0.14075, min=0.037424 at (0, 3), max=0.54549 at (0, 0), avg-magnitude=0.19201
        [I]                 ---- Values ----
                                [[0.54548573 0.16484731 0.16236764 0.03742395 0.09265932 0.10594716
                                  0.3081977  0.11912934]]
        [I]                 ---- Histogram ----
                            Bin Range        |  Num Elems | Visualization
                            (0.0374, 0.0882) |          1 | #############
                            (0.0882, 0.139 ) |          3 | ########################################
                            (0.139 , 0.19  ) |          2 | ##########################
                            (0.19  , 0.241 ) |          0 | 
                            (0.241 , 0.291 ) |          0 | 
                            (0.291 , 0.342 ) |          1 | #############
                            (0.342 , 0.393 ) |          0 | 
                            (0.393 , 0.444 ) |          0 | 
                            (0.444 , 0.495 ) |          0 | 
                            (0.495 , 0.545 ) |          1 | #############
        [E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=0)
        [E]     FAILED | Mismatched outputs: ['output']
        [E] Accuracy Summary | trt-runner-N0-10/23/23-17:05:18 vs. onnxrt-runner-N0-10/23/23-15:45:37 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 3.556s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[E]     ========== CAPTURED STDERR ==========
[I]     Saving debug replay to polygraphy_debug_replay.json
[E]     FAILED | Iteration 3 | Duration 200.0889549255371s
[I]     RUNNING | Iteration 4 | Approximately 9 iteration(s) remaining
[I]     Selecting first 1252 layer(s) to run in higher precision
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1080, 1081, 1083, 1084, 1090, 1091, 1093, 1094, 1100, 1101, 1107, 1111, 1115, 1119, 1123, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1135, 1136, 1139, 1141, 1142, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1157, 1158, 1160, 1161, 1163, 1164, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1177, 1178, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1193, 1194, 1196, 1197, 1203, 1204, 1206, 1207, 1213, 1214, 1220, 1224, 1228, 1232, 1236, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0523377 seconds.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1642.67ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1490604400]
[V]       Calibrated batch 0 in 0.166838 seconds.
[V]       Post Processing Calibration data in 165.977 seconds.
[V]     Calibration completed in 170.505 seconds.
[V]     Writing Calibration Cache for calibrator: TRT-8601-EntropyCalibration2
[V]     Graph optimization time: 1.50964 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 372656
[V]     Total Device Persistent Memory: 285696
[V]     Total Scratch Memory: 154112
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 485 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 18.2183ms to assign 7 blocks to 485 nodes requiring 1690624 bytes.
[V]     Total Activation Memory: 1689600
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +288, GPU +300, now: CPU 288, GPU 300 (MiB)
[I]     Finished engine building in 197.210 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     ========== CAPTURED STDOUT ==========
        [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
        [I] trt-runner-N0-10/23/23-17:08:40     | Activating and starting inference
        [I] Loading bytes from /workspace/img_clf/notebooks/polygraphy_debug.engine
        [I] trt-runner-N0-10/23/23-17:08:40    
            ---- Inference Input(s) ----
            {input [dtype=float32, shape=(1, 3, 224, 224)]}
        [I] trt-runner-N0-10/23/23-17:08:40    
            ---- Inference Output(s) ----
            {output [dtype=float32, shape=(1, 8)]}
        [I] trt-runner-N0-10/23/23-17:08:40     | Completed 1 iteration(s) in 653.9 ms | Average inference time: 653.9 ms.
        [I] Loading inference results from /workspace/img_clf/data/layerwise_golden.json
        [I] Accuracy Comparison | trt-runner-N0-10/23/23-17:08:40 vs. onnxrt-runner-N0-10/23/23-15:45:37
        [I]     Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))
        [I]         Tolerance: [abs=0, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-10/23/23-17:08:40: output | Stats: mean=-4.832, std-dev=3.0892, var=9.543, median=-5.8418, min=-8.3838 at (0, 1), max=1.2505 at (0, 0), avg-magnitude=5.1446
        [I]             ---- Values ----
                            [[ 1.250505  -8.383767  -1.8844577 -5.0319767 -7.564127  -6.651675
                              -3.329008  -7.061149 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          1 | #############
                        (-7.63 , -6.65 ) |          3 | ########################################
                        (-6.65 , -5.66 ) |          0 | 
                        (-5.66 , -4.67 ) |          1 | #############
                        (-4.67 , -3.68 ) |          0 | 
                        (-3.68 , -2.69 ) |          1 | #############
                        (-2.69 , -1.7  ) |          1 | #############
                        (-1.7  , -0.714) |          0 | 
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | #############
        [I]         onnxrt-runner-N0-10/23/23-15:45:37: output | Stats: mean=-4.5612, std-dev=3.0662, var=9.4016, median=-5.3557, min=-8.6229 at (0, 1), max=1.2628 at (0, 0), avg-magnitude=4.8769
        [I]             ---- Values ----
                            [[ 1.2627729 -8.622935  -1.5289186 -4.499915  -7.0470834 -6.714871
                              -3.1271095 -6.211514 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-8.62 , -7.63 ) |          1 | ####################
                        (-7.63 , -6.65 ) |          2 | ########################################
                        (-6.65 , -5.66 ) |          1 | ####################
                        (-5.66 , -4.67 ) |          0 | 
                        (-4.67 , -3.68 ) |          1 | ####################
                        (-3.68 , -2.69 ) |          1 | ####################
                        (-2.69 , -1.7  ) |          0 | 
                        (-1.7  , -0.714) |          1 | ####################
                        (-0.714, 0.274 ) |          0 | 
                        (0.274 , 1.26  ) |          1 | ####################
        [I]         Error Metrics: output
        [I]             Minimum Required Tolerance: elemwise error | [abs=0.84964] OR [rel=0.23254] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=0.34635, std-dev=0.26008, var=0.067643, median=0.29735, min=0.012268 at (0, 0), max=0.84964 at (0, 7), avg-magnitude=0.34635
        [I]                 ---- Values ----
                                [[0.01226795 0.23916817 0.35553908 0.5320616  0.5170436  0.06319571
                                  0.20189857 0.8496351 ]]
        [I]                 ---- Histogram ----
                            Bin Range       |  Num Elems | Visualization
                            (0.0123, 0.096) |          2 | ########################################
                            (0.096 , 0.18 ) |          0 | 
                            (0.18  , 0.263) |          2 | ########################################
                            (0.263 , 0.347) |          0 | 
                            (0.347 , 0.431) |          1 | ####################
                            (0.431 , 0.515) |          0 | 
                            (0.515 , 0.598) |          2 | ########################################
                            (0.598 , 0.682) |          0 | 
                            (0.682 , 0.766) |          0 | 
                            (0.766 , 0.85 ) |          1 | ####################
        [I]             Relative Difference | Stats: mean=0.084045, std-dev=0.071381, var=0.0050952, median=0.068967, min=0.0094113 at (0, 5), max=0.23254 at (0, 2), avg-magnitude=0.084045
        [I]                 ---- Values ----
                                [[0.00971509 0.02773628 0.23254284 0.11823814 0.07336987 0.00941131
                                  0.06456396 0.1367839 ]]
        [I]                 ---- Histogram ----
                            Bin Range         |  Num Elems | Visualization
                            (0.00941, 0.0317) |          3 | ########################################
                            (0.0317 , 0.054 ) |          0 | 
                            (0.054  , 0.0764) |          2 | ##########################
                            (0.0764 , 0.0987) |          0 | 
                            (0.0987 , 0.121 ) |          1 | #############
                            (0.121  , 0.143 ) |          1 | #############
                            (0.143  , 0.166 ) |          0 | 
                            (0.166  , 0.188 ) |          0 | 
                            (0.188  , 0.21  ) |          0 | 
                            (0.21   , 0.233 ) |          1 | #############
        [E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=0)
        [E]     FAILED | Mismatched outputs: ['output']
        [E] Accuracy Summary | trt-runner-N0-10/23/23-17:08:40 vs. onnxrt-runner-N0-10/23/23-15:45:37 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 3.592s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[E]     ========== CAPTURED STDERR ==========
[I]     Saving debug replay to polygraphy_debug_replay.json
[E]     FAILED | Iteration 4 | Duration 201.41086626052856s
[I]     RUNNING | Iteration 5 | Approximately 8 iteration(s) remaining
[I]     Selecting first 1341 layer(s) to run in higher precision
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1080, 1081, 1083, 1084, 1090, 1091, 1093, 1094, 1100, 1101, 1107, 1111, 1115, 1119, 1123, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1135, 1136, 1139, 1141, 1142, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1157, 1158, 1160, 1161, 1163, 1164, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1177, 1178, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1193, 1194, 1196, 1197, 1203, 1204, 1206, 1207, 1213, 1214, 1220, 1224, 1228, 1232, 1236, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249, 1252, 1254, 1255, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1270, 1271, 1273, 1274, 1276, 1277, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1290, 1291, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1306, 1307, 1309, 1310, 1316, 1317, 1319, 1320, 1326, 1327, 1333, 1337} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0486154 seconds.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1629.78ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1490604400]
[V]       Calibrated batch 0 in 0.166633 seconds.
[V]       Post Processing Calibration data in 163.08 seconds.
[V]     Calibration completed in 167.577 seconds.
[V]     Writing Calibration Cache for calibrator: TRT-8601-EntropyCalibration2
[V]     Graph optimization time: 1.498 seconds.
[V]     Global timing cache in use. Profiling results in this builder pass will be stored.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 369776
[V]     Total Device Persistent Memory: 313344
[V]     Total Scratch Memory: 154112
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 473 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 17.0699ms to assign 7 blocks to 473 nodes requiring 1690624 bytes.
[V]     Total Activation Memory: 1689600
[V]     [MemUsageChange] TensorRT-managed allocation in building engine: CPU +308, GPU +320, now: CPU 308, GPU 320 (MiB)
[I]     Finished engine building in 190.840 seconds
[I]     Running check command: polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[I]     ========== CAPTURED STDOUT ==========
        [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
        [I] RUNNING | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
        [I] trt-runner-N0-10/23/23-17:11:55     | Activating and starting inference
        [I] Loading bytes from /workspace/img_clf/notebooks/polygraphy_debug.engine
        [I] trt-runner-N0-10/23/23-17:11:55    
            ---- Inference Input(s) ----
            {input [dtype=float32, shape=(1, 3, 224, 224)]}
        [I] trt-runner-N0-10/23/23-17:11:55    
            ---- Inference Output(s) ----
            {output [dtype=float32, shape=(1, 8)]}
        [I] trt-runner-N0-10/23/23-17:11:55     | Completed 1 iteration(s) in 659 ms | Average inference time: 659 ms.
        [I] Loading inference results from /workspace/img_clf/data/layerwise_golden.json
        [I] Accuracy Comparison | trt-runner-N0-10/23/23-17:11:55 vs. onnxrt-runner-N0-10/23/23-15:45:37
        [I]     Comparing Output: 'output' (dtype=float32, shape=(1, 8)) with 'output' (dtype=float32, shape=(1, 8))
        [I]         Tolerance: [abs=0, rel=1e-05] | Checking elemwise error
        [I]         trt-runner-N0-10/23/23-17:11:55: output | Stats: mean=-4.7106, std-dev=3.34, var=11.156, median=-5.5748, min=-9.1529 at (0, 1), max=1.928 at (0, 0), avg-magnitude=5.1926
        [I]             ---- Values ----
                            [[ 1.9280488 -9.152926  -1.7256244 -4.526225  -7.2750354 -6.8673315
                              -3.4420202 -6.623417 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-9.15 , -8.04 ) |          1 | ####################
                        (-8.04 , -6.94 ) |          1 | ####################
                        (-6.94 , -5.83 ) |          2 | ########################################
                        (-5.83 , -4.72 ) |          0 | 
                        (-4.72 , -3.61 ) |          1 | ####################
                        (-3.61 , -2.5  ) |          1 | ####################
                        (-2.5  , -1.4  ) |          1 | ####################
                        (-1.4  , -0.288) |          0 | 
                        (-0.288, 0.82  ) |          0 | 
                        (0.82  , 1.93  ) |          1 | ####################
        [I]         onnxrt-runner-N0-10/23/23-15:45:37: output | Stats: mean=-4.5612, std-dev=3.0662, var=9.4016, median=-5.3557, min=-8.6229 at (0, 1), max=1.2628 at (0, 0), avg-magnitude=4.8769
        [I]             ---- Values ----
                            [[ 1.2627729 -8.622935  -1.5289186 -4.499915  -7.0470834 -6.714871
                              -3.1271095 -6.211514 ]]
        [I]             ---- Histogram ----
                        Bin Range        |  Num Elems | Visualization
                        (-9.15 , -8.04 ) |          1 | ####################
                        (-8.04 , -6.94 ) |          1 | ####################
                        (-6.94 , -5.83 ) |          2 | ########################################
                        (-5.83 , -4.72 ) |          0 | 
                        (-4.72 , -3.61 ) |          1 | ####################
                        (-3.61 , -2.5  ) |          1 | ####################
                        (-2.5  , -1.4  ) |          1 | ####################
                        (-1.4  , -0.288) |          0 | 
                        (-0.288, 0.82  ) |          0 | 
                        (0.82  , 1.93  ) |          1 | ####################
        [I]         Error Metrics: output
        [I]             Minimum Required Tolerance: elemwise error | [abs=0.66528] OR [rel=0.52684] (requirements may be lower if both abs/rel tolerances are set)
        [I]             Absolute Difference | Stats: mean=0.31569, std-dev=0.19673, var=0.038704, median=0.27143, min=0.02631 at (0, 3), max=0.66528 at (0, 0), avg-magnitude=0.31569
        [I]                 ---- Values ----
                                [[0.66527593 0.52999115 0.19670582 0.02630997 0.227952   0.15246058
                                  0.31491065 0.4119029 ]]
        [I]                 ---- Histogram ----
                            Bin Range        |  Num Elems | Visualization
                            (0.0263, 0.0902) |          1 | ########################################
                            (0.0902, 0.154 ) |          1 | ########################################
                            (0.154 , 0.218 ) |          1 | ########################################
                            (0.218 , 0.282 ) |          1 | ########################################
                            (0.282 , 0.346 ) |          1 | ########################################
                            (0.346 , 0.41  ) |          0 | 
                            (0.41  , 0.474 ) |          1 | ########################################
                            (0.474 , 0.537 ) |          1 | ########################################
                            (0.537 , 0.601 ) |          0 | 
                            (0.601 , 0.665 ) |          1 | ########################################
        [I]             Relative Difference | Stats: mean=0.11811, std-dev=0.15907, var=0.025303, median=0.063888, min=0.0058468 at (0, 3), max=0.52684 at (0, 0), avg-magnitude=0.11811
        [I]                 ---- Values ----
                                [[0.52683735 0.06146296 0.12865682 0.00584677 0.032347   0.02270492
                                  0.10070343 0.0663128 ]]
        [I]                 ---- Histogram ----
                            Bin Range         |  Num Elems | Visualization
                            (0.00585, 0.0579) |          3 | ########################################
                            (0.0579 , 0.11  ) |          3 | ########################################
                            (0.11   , 0.162 ) |          1 | #############
                            (0.162  , 0.214 ) |          0 | 
                            (0.214  , 0.266 ) |          0 | 
                            (0.266  , 0.318 ) |          0 | 
                            (0.318  , 0.371 ) |          0 | 
                            (0.371  , 0.423 ) |          0 | 
                            (0.423  , 0.475 ) |          0 | 
                            (0.475  , 0.527 ) |          1 | #############
        [E]         FAILED | Output: 'output' | Difference exceeds tolerance (rel=1e-05, abs=0)
        [E]     FAILED | Mismatched outputs: ['output']
        [E] Accuracy Summary | trt-runner-N0-10/23/23-17:11:55 vs. onnxrt-runner-N0-10/23/23-15:45:37 | Passed: 0/1 iterations | Pass Rate: 0.0%
        [E] FAILED | Runtime: 3.615s | Command: /usr/local/bin/polygraphy run polygraphy_debug.engine --trt --load-outputs /workspace/img_clf/data/layerwise_golden.json --abs 0
[E]     ========== CAPTURED STDERR ==========
[I]     Saving debug replay to polygraphy_debug_replay.json
[E]     FAILED | Iteration 5 | Duration 195.09278988838196s
[I]     RUNNING | Iteration 6 | Approximately 7 iteration(s) remaining
[I]     Selecting first 1386 layer(s) to run in higher precision
[V]     Marking layer(s): {0, 5, 6, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 66, 67, 69, 70, 79, 80, 86, 90, 94, 98, 102, 106, 107, 108, 110, 111, 112, 113, 115, 116, 119, 121, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150, 151, 153, 154, 155, 157, 158, 160, 161, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 186, 187, 189, 190, 196, 197, 203, 207, 211, 215, 219, 223, 224, 225, 226, 227, 228, 229, 231, 232, 235, 237, 238, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 253, 254, 256, 257, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 273, 274, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 290, 292, 293, 299, 300, 302, 303, 309, 310, 316, 320, 324, 328, 332, 336, 337, 338, 339, 340, 341, 342, 344, 345, 348, 350, 351, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 366, 367, 369, 370, 372, 373, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 386, 387, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 402, 403, 405, 406, 412, 413, 415, 416, 422, 423, 429, 433, 437, 441, 445, 449, 450, 451, 452, 453, 454, 455, 457, 458, 461, 463, 464, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 479, 480, 482, 483, 485, 486, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 515, 516, 518, 519, 525, 526, 528, 529, 535, 536, 542, 546, 550, 554, 558, 562, 563, 564, 565, 566, 567, 568, 570, 571, 574, 576, 577, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 592, 593, 595, 596, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 612, 613, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 628, 629, 631, 632, 638, 639, 641, 642, 648, 649, 655, 659, 663, 667, 671, 675, 676, 677, 678, 679, 680, 681, 683, 684, 687, 689, 690, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 705, 706, 708, 709, 711, 712, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 725, 726, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 741, 742, 744, 745, 751, 752, 754, 755, 761, 762, 768, 772, 776, 780, 784, 788, 789, 790, 791, 792, 793, 794, 796, 797, 800, 802, 803, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 818, 819, 821, 822, 824, 825, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 838, 839, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 854, 855, 857, 858, 864, 865, 867, 868, 874, 875, 881, 885, 889, 893, 897, 901, 902, 903, 904, 905, 906, 907, 909, 910, 913, 915, 916, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 931, 932, 934, 935, 937, 938, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 951, 952, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 967, 968, 970, 971, 977, 978, 980, 981, 987, 988, 994, 998, 1002, 1006, 1010, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1022, 1023, 1026, 1028, 1029, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1044, 1045, 1047, 1048, 1050, 1051, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1064, 1065, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1080, 1081, 1083, 1084, 1090, 1091, 1093, 1094, 1100, 1101, 1107, 1111, 1115, 1119, 1123, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1135, 1136, 1139, 1141, 1142, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1157, 1158, 1160, 1161, 1163, 1164, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1177, 1178, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1193, 1194, 1196, 1197, 1203, 1204, 1206, 1207, 1213, 1214, 1220, 1224, 1228, 1232, 1236, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249, 1252, 1254, 1255, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1270, 1271, 1273, 1274, 1276, 1277, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1290, 1291, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1306, 1307, 1309, 1310, 1316, 1317, 1319, 1320, 1326, 1327, 1333, 1337, 1341, 1345, 1349, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1361, 1362, 1365, 1367, 1368, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1383, 1384} to run in DataType.FLOAT precision
[W]     Int8 Calibration is using randomly generated input data.
        This could negatively impact accuracy if the inference-time input data is dissimilar to the randomly generated calibration data.
        You may want to consider providing real data via the --data-loader-script option.
[V]     Created calibrator [cache=None]
[V]     Builder and Network were provided directly instead of via a Callable. This loader will not assume ownership. Please ensure that they are freed.
[V]         Setting TensorRT Optimization Profiles
[V]         Input tensor: input (dtype=DataType.FLOAT, shape=(-1, 3, 224, 224)) | Setting input tensor shapes to: (min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])
[I]         Configuring with profiles: [Profile().add('input', min=[1, 3, 224, 224], opt=[1, 3, 224, 224], max=[1, 3, 224, 224])]
[I]     Building engine with configuration:
        Flags                  | [INT8, OBEY_PRECISION_CONSTRAINTS]
        Engine Capability      | EngineCapability.DEFAULT
        Memory Pools           | [WORKSPACE: 15109.75 MiB, TACTIC_DRAM: 15109.75 MiB]
        Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
        Profiling Verbosity    | ProfilingVerbosity.DETAILED
        Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
        Calibrator             | Calibrator(DataLoader(seed=1, iterations=1, int_range=(1, 25), float_range=(-1.0, 1.0), val_range=(0.0, 1.0)), BaseClass=<class 'tensorrt.tensorrt.IInt8EntropyCalibrator2'>)
[V]     Graph optimization time: 0.0533773 seconds.
[V]     Timing cache disabled. Turning it on will improve builder speed.
[V]     [GraphReduction] The approximate region cut reduction algorithm is called.
[V]     Detected 1 inputs and 1 output network tensors.
[V]     Total Host Persistent Memory: 342640
[V]     Total Device Persistent Memory: 0
[V]     Total Scratch Memory: 512
[V]     [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 365 MiB, GPU 659 MiB
[V]     [BlockAssignment] Started assigning block shifts. This will take 888 steps to complete.
[V]     [BlockAssignment] Algorithm ShiftNTopDown took 1657.49ms to assign 251 blocks to 888 nodes requiring 341193728 bytes.
[V]     Total Activation Memory: 341193728
[V]     [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +325, now: CPU 0, GPU 659 (MiB)
[V]     Starting Calibration.
[V]     Generating data using numpy seed: 1
[V]     Allocated: DeviceArray[(dtype=float32, shape=(1, 3, 224, 224)), ptr=0x7f1490604400]
[V]       Calibrated batch 0 in 0.167344 seconds.

@zerollzeng
Copy link
Collaborator

@pranavm-nvidia I haven't sue polygraphy debut yet, could you please kindly help here :-)

@jinhonglu
Copy link

after the debug process, how should we use the reply.json file to build our mixed-precision engine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

7 participants