deseriasized model failure of TensorRT 8.6.1.6 when running in C++ code on GPU v100 #3307

mucaoshen · 2023-09-11T07:43:04Z

Description

I tried to deserialized model in C++ code like this:
In test.h

#include <NvInfer.h>
#include <string>
#include <vector>
#include <memory>

#define CHECK(call, resContent) check(call, __LINE__, __FILE__, resContent)

inline bool check(cudaError_t e, int iLine, const char *szFile, std::string& resContent) {
	if (e != cudaSuccess) {
		resContent = "CUDA runtime API error ";
		resContent += std::string(cudaGetErrorName(e));
		resContent += " at line " + std::to_string(iLine);
		resContent += " in file " + std::string(szFile);
		resContent += "\n";
		// std::cout << "CUDA runtime API error " << cudaGetErrorName(e) << " at line " << iLine << " in file " << szFile << std::endl;
        return false;
	}
	resContent = "";
	return true;
};

class TRTLogger: public nvinfer1::ILogger {
public:
	nvinfer1::ILogger::Severity reportableServerity;

public:
	TRTLogger(nvinfer1::ILogger::Severity severity = nvinfer1::ILogger::Severity::kVERBOSE): reportableServerity(severity) {
	}
	void log(nvinfer1::ILogger::Severity severity, const char* msg) noexcept override;

};

class B {
	public:
	virtual int modelLoad(const std::string& m_modelPath) = 0;

};

class A: public B {
	public:
	int modelLoad(const std::string& m_modelPath) override;
	static TRTLogger s_Logger;
	private:
	nvinfer1::ICudaEngine* m_engine;
};

int bytesToInteger(char* buffer) {
	return *reinterpret_cast<int*>(buffer);
}

In test.cpp

#include "test.h"
#include <iostream>
#include <iostream>
#include <fstream>
#include <vector>



void TRTLogger::log(nvinfer1::ILogger::Severity severity, const char* msg) noexcept {
	if (severity > reportableServerity) {
		return;
	}
	switch (severity)
	{
	case nvinfer1::ILogger::Severity::kINTERNAL_ERROR:
		std::cout<<"INTERNAL_ERROR: " + std::string(msg)<<std::endl;
		break;

	case nvinfer1::ILogger::Severity::kERROR:
		std::cout<<"ERROR: " + std::string(msg)<<std::endl;
		break;

	case nvinfer1::ILogger::Severity::kWARNING:
		std::cout<<"WARNING: " + std::string(msg)<<std::endl;
		break;

	case nvinfer1::ILogger::Severity::kINFO:
		std::cout<<"INFO: " + std::string(msg)<<std::endl;
		break;
	
	default:
		std::cout<<"VERBOSE: " + std::string(msg)<<std::endl;
		break;
	}
}

TRTLogger A::s_Logger = TRTLogger();
int A::modelLoad(const std::string& m_modelPath) {
	std::string tmpLogStr;
	bool isSuccess = CHECK(cudaSetDevice(0), tmpLogStr);
	if (!isSuccess) {
		throw std::runtime_error("cuda set device in modelLoad unsuccessfully : " + tmpLogStr);
	}

	std::ifstream engineFile(m_modelPath, std::ios::binary);
	long int fsize = 0;
	// get file size
	std::cout<<"Parsing model file!"<<std::endl;
	engineFile.seekg(0, engineFile.end);
	fsize = engineFile.tellg();
	engineFile.seekg(0, engineFile.beg);
	// get meta info
	char* metaLenBytes;
	metaLenBytes = (char*)malloc(4);
	engineFile.read(metaLenBytes, 4);
	int metaLen = bytesToInteger(metaLenBytes);
	if (metaLenBytes != nullptr) free(metaLenBytes);
	// TODO: get meta json str
	engineFile.seekg(4, engineFile.beg);
	char* metaBytes;
	metaBytes = (char*)malloc(metaLen);
	engineFile.read(metaBytes, metaLen);
	if (metaBytes != nullptr) free(metaBytes);
	// get model info
	std::vector<char> engineStr(fsize - metaLen - 4);
	engineFile.seekg(metaLen + 4, engineFile.beg);
	engineFile.read(engineStr.data(), fsize - metaLen - 4);

	if (engineStr.size() == 0) {
		std::cout<<"Failed getting serialized engine!"<<std::endl;
		engineFile.close();
		return -1;
	}
	engineFile.close();
	std::cout<<"Succeeded getting serialized engine!"<<std::endl;

	// create inference env, deserialize engine
	nvinfer1::IRuntime* m_runtime {nvinfer1::createInferRuntime(s_Logger)};
	m_engine = m_runtime->deserializeCudaEngine(engineStr.data(), engineStr.size());
	if (m_engine == nullptr) {
		std::cout<<"Failed loading engine!"<<std::endl;
		return -1;
	}
	return 0;
}

and in main.cpp

#include <string>
#include "test.h"


int main() {
	std::string modelPath("../ResNet34_trackerOCR_36_450_20230627_half.engine");
	B* a = new A();
	int retCode = a->modelLoad(modelPath);
}

when i tried to run the code with model engine on GPU v100, and get an error like this log:

Parsing model file!
Succeeded getting serialized engine!
INFO: Loaded engine size: 47 MiB
ERROR: 1: [dispatchStubs.cpp::deserializeEngine::14] Error Code 1: Internal Error (Unexpected call to stub)
Failed loading engine!

I have tried writing the contents of test.cpp and test.h in main.cpp. At this time, the deserialization is unsuccessful.
This confuses me, I don't know what I'm doing wrong.
In addition, I also did corresponding tests in the python program and it was also normal.

Environment

TensorRT Version: 8.6.1.6

NVIDIA GPU: Tesla V100

NVIDIA Driver Version: 515.43.04

CUDA Version: 11.7.99

CUDNN Version: 8.9.2

Operating System: ubuntu16.04

Python Version (if applicable): 3.8

Tensorflow Version (if applicable): no use

PyTorch Version (if applicable): 1.13.1

Baremetal or Container (if so, version): no use

Relevant Files

For related codes and models, please refer to this link.

Steps To Reproduce

Commands or scripts:

$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ..
$ make
$ cd ../bin
$ ./main

Have you tried the latest release?: Yes

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): No. I think the onnx model is ok

The text was updated successfully, but these errors were encountered:

zerollzeng · 2023-09-11T14:12:53Z

Can you try trtexec first? I want to to know whether this is a bug or not. You can use our official container and run the command link trtexec --loadEngine=model.plan

mucaoshen · 2023-09-12T07:40:46Z

First of all, thank your reply!
Yes, i tried to use trtexec to check the model, the output log like this:

(pytorch1.12) $ trtexec --loadEngine=./ResNet34_trackerOCR_36_450_20230627_half.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --loadEngine=./ResNet34_trackerOCR_36_450_20230627_half.engine
[09/12/2023-14:52:20] [I] === Model Options ===
[09/12/2023-14:52:20] [I] Format: *
[09/12/2023-14:52:20] [I] Model: 
[09/12/2023-14:52:20] [I] Output:
[09/12/2023-14:52:20] [I] === Build Options ===
[09/12/2023-14:52:20] [I] Max batch: 1
[09/12/2023-14:52:20] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[09/12/2023-14:52:20] [I] minTiming: 1
[09/12/2023-14:52:20] [I] avgTiming: 8
[09/12/2023-14:52:20] [I] Precision: FP32
[09/12/2023-14:52:20] [I] LayerPrecisions: 
[09/12/2023-14:52:20] [I] Layer Device Types: 
[09/12/2023-14:52:20] [I] Calibration: 
[09/12/2023-14:52:20] [I] Refit: Disabled
[09/12/2023-14:52:20] [I] Version Compatible: Disabled
[09/12/2023-14:52:20] [I] TensorRT runtime: full
[09/12/2023-14:52:20] [I] Lean DLL Path: 
[09/12/2023-14:52:20] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[09/12/2023-14:52:20] [I] Exclude Lean Runtime: Disabled
[09/12/2023-14:52:20] [I] Sparsity: Disabled
[09/12/2023-14:52:20] [I] Safe mode: Disabled
[09/12/2023-14:52:20] [I] Build DLA standalone loadable: Disabled
[09/12/2023-14:52:20] [I] Allow GPU fallback for DLA: Disabled
[09/12/2023-14:52:20] [I] DirectIO mode: Disabled
[09/12/2023-14:52:20] [I] Restricted mode: Disabled
[09/12/2023-14:52:20] [I] Skip inference: Disabled
[09/12/2023-14:52:20] [I] Save engine: 
[09/12/2023-14:52:20] [I] Load engine: ./ResNet34_trackerOCR_36_450_20230627_half.engine
[09/12/2023-14:52:20] [I] Profiling verbosity: 0
[09/12/2023-14:52:20] [I] Tactic sources: Using default tactic sources
[09/12/2023-14:52:20] [I] timingCacheMode: local
[09/12/2023-14:52:20] [I] timingCacheFile: 
[09/12/2023-14:52:20] [I] Heuristic: Disabled
[09/12/2023-14:52:20] [I] Preview Features: Use default preview flags.
[09/12/2023-14:52:20] [I] MaxAuxStreams: -1
[09/12/2023-14:52:20] [I] BuilderOptimizationLevel: -1
[09/12/2023-14:52:20] [I] Input(s)s format: fp32:CHW
[09/12/2023-14:52:20] [I] Output(s)s format: fp32:CHW
[09/12/2023-14:52:20] [I] Input build shapes: model
[09/12/2023-14:52:20] [I] Input calibration shapes: model
[09/12/2023-14:52:20] [I] === System Options ===
[09/12/2023-14:52:20] [I] Device: 0
[09/12/2023-14:52:20] [I] DLACore: 
[09/12/2023-14:52:20] [I] Plugins:
[09/12/2023-14:52:20] [I] setPluginsToSerialize:
[09/12/2023-14:52:20] [I] dynamicPlugins:
[09/12/2023-14:52:20] [I] ignoreParsedPluginLibs: 0
[09/12/2023-14:52:20] [I] 
[09/12/2023-14:52:20] [I] === Inference Options ===
[09/12/2023-14:52:20] [I] Batch: 1
[09/12/2023-14:52:20] [I] Input inference shapes: model
[09/12/2023-14:52:20] [I] Iterations: 10
[09/12/2023-14:52:20] [I] Duration: 3s (+ 200ms warm up)
[09/12/2023-14:52:20] [I] Sleep time: 0ms
[09/12/2023-14:52:20] [I] Idle time: 0ms
[09/12/2023-14:52:20] [I] Inference Streams: 1
[09/12/2023-14:52:20] [I] ExposeDMA: Disabled
[09/12/2023-14:52:20] [I] Data transfers: Enabled
[09/12/2023-14:52:20] [I] Spin-wait: Disabled
[09/12/2023-14:52:20] [I] Multithreading: Disabled
[09/12/2023-14:52:20] [I] CUDA Graph: Disabled
[09/12/2023-14:52:20] [I] Separate profiling: Disabled
[09/12/2023-14:52:20] [I] Time Deserialize: Disabled
[09/12/2023-14:52:20] [I] Time Refit: Disabled
[09/12/2023-14:52:20] [I] NVTX verbosity: 0
[09/12/2023-14:52:20] [I] Persistent Cache Ratio: 0
[09/12/2023-14:52:20] [I] Inputs:
[09/12/2023-14:52:20] [I] === Reporting Options ===
[09/12/2023-14:52:20] [I] Verbose: Disabled
[09/12/2023-14:52:20] [I] Averages: 10 inferences
[09/12/2023-14:52:20] [I] Percentiles: 90,95,99
[09/12/2023-14:52:20] [I] Dump refittable layers:Disabled
[09/12/2023-14:52:20] [I] Dump output: Disabled
[09/12/2023-14:52:20] [I] Profile: Disabled
[09/12/2023-14:52:20] [I] Export timing to JSON file: 
[09/12/2023-14:52:20] [I] Export output to JSON file: 
[09/12/2023-14:52:20] [I] Export profile to JSON file: 
[09/12/2023-14:52:20] [I] 
[09/12/2023-14:52:21] [I] === Device Information ===
[09/12/2023-14:52:21] [I] Selected Device: Tesla V100-SXM2-32GB
[09/12/2023-14:52:21] [I] Compute Capability: 7.0
[09/12/2023-14:52:21] [I] SMs: 80
[09/12/2023-14:52:21] [I] Device Global Memory: 32510 MiB
[09/12/2023-14:52:21] [I] Shared Memory per SM: 96 KiB
[09/12/2023-14:52:21] [I] Memory Bus Width: 4096 bits (ECC enabled)
[09/12/2023-14:52:21] [I] Application Compute Clock Rate: 1.53 GHz
[09/12/2023-14:52:21] [I] Application Memory Clock Rate: 0.877 GHz
[09/12/2023-14:52:21] [I] 
[09/12/2023-14:52:21] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[09/12/2023-14:52:21] [I] 
[09/12/2023-14:52:21] [I] TensorRT version: 8.6.1
[09/12/2023-14:52:21] [I] Loading standard plugins
[09/12/2023-14:52:21] [I] Engine loaded in 0.099107 sec.
[09/12/2023-14:52:21] [I] [TRT] Loaded engine size: 47 MiB
[09/12/2023-14:52:21] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +46, now: CPU 0, GPU 46 (MiB)
[09/12/2023-14:52:21] [I] Engine deserialized in 0.482258 sec.
[09/12/2023-14:52:21] [I] [TRT] [MS] Running engine with multi stream info
[09/12/2023-14:52:21] [I] [TRT] [MS] Number of aux streams is 1
[09/12/2023-14:52:21] [I] [TRT] [MS] Number of total worker streams is 2
[09/12/2023-14:52:21] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[09/12/2023-14:52:21] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +38, now: CPU 0, GPU 84 (MiB)
[09/12/2023-14:52:21] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[09/12/2023-14:52:21] [I] Setting persistentCacheLimit to 0 bytes.
[09/12/2023-14:52:21] [W] Shape missing for input with dynamic shape: imagesAutomatically setting shape to: 1x1x36x450
[09/12/2023-14:52:21] [I] Using random values for input images
[09/12/2023-14:52:21] [I] Input binding for images with dimensions 1x1x36x450 is created.
[09/12/2023-14:52:21] [I] Output binding for output0 with dimensions 1x57x14665 is created.
[09/12/2023-14:52:21] [I] Starting inference
[09/12/2023-14:52:24] [I] Warmup completed 32 queries over 200 ms
[09/12/2023-14:52:24] [I] Timing trace has 511 queries over 3.01528 s
[09/12/2023-14:52:24] [I] 
[09/12/2023-14:52:24] [I] === Trace details ===
[09/12/2023-14:52:24] [I] Trace averages of 10 runs:
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.95722 ms - Host latency: 6.10713 ms (enqueue 2.12868 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.95046 ms - Host latency: 6.10154 ms (enqueue 2.16006 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.83135 ms - Host latency: 5.98027 ms (enqueue 2.0971 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6658 ms - Host latency: 5.81426 ms (enqueue 2.02772 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67111 ms - Host latency: 5.81943 ms (enqueue 2.02517 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66824 ms - Host latency: 5.81615 ms (enqueue 2.02475 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.65996 ms - Host latency: 5.80783 ms (enqueue 2.02708 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66303 ms - Host latency: 5.81025 ms (enqueue 2.0019 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66323 ms - Host latency: 5.81227 ms (enqueue 2.02696 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66212 ms - Host latency: 5.81015 ms (enqueue 2.03702 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6668 ms - Host latency: 5.81578 ms (enqueue 2.03282 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66476 ms - Host latency: 5.81346 ms (enqueue 2.02709 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.65964 ms - Host latency: 5.8077 ms (enqueue 2.02647 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67133 ms - Host latency: 5.8189 ms (enqueue 2.02748 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66866 ms - Host latency: 5.81728 ms (enqueue 2.02609 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67296 ms - Host latency: 5.82157 ms (enqueue 2.02437 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66282 ms - Host latency: 5.81222 ms (enqueue 2.00498 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6653 ms - Host latency: 5.8134 ms (enqueue 2.02383 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6611 ms - Host latency: 5.80996 ms (enqueue 2.02515 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66427 ms - Host latency: 5.81266 ms (enqueue 2.02683 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66947 ms - Host latency: 5.81747 ms (enqueue 2.03262 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.68475 ms - Host latency: 5.83412 ms (enqueue 1.97859 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 6.09576 ms - Host latency: 6.25585 ms (enqueue 2.13538 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 6.02061 ms - Host latency: 6.18273 ms (enqueue 2.13502 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66566 ms - Host latency: 5.81377 ms (enqueue 2.0287 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.88198 ms - Host latency: 6.04028 ms (enqueue 2.07136 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 7.34084 ms - Host latency: 9.14175 ms (enqueue 2.12656 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 7.74329 ms - Host latency: 9.83772 ms (enqueue 2.04458 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 7.92822 ms - Host latency: 10.1846 ms (enqueue 2.10048 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 8.15381 ms - Host latency: 10.7014 ms (enqueue 2.00776 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 6.34496 ms - Host latency: 6.98275 ms (enqueue 2.00746 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66548 ms - Host latency: 5.81328 ms (enqueue 2.02808 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67654 ms - Host latency: 5.82566 ms (enqueue 2.02197 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6687 ms - Host latency: 5.81763 ms (enqueue 2.01785 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67539 ms - Host latency: 5.82456 ms (enqueue 2.0335 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67039 ms - Host latency: 5.81973 ms (enqueue 2.02637 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66816 ms - Host latency: 5.81726 ms (enqueue 2.02468 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67349 ms - Host latency: 5.82261 ms (enqueue 2.02991 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.65906 ms - Host latency: 5.80603 ms (enqueue 2.02947 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66165 ms - Host latency: 5.80857 ms (enqueue 2.02393 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.68105 ms - Host latency: 5.83049 ms (enqueue 2.02991 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67217 ms - Host latency: 5.82126 ms (enqueue 2.02915 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.65769 ms - Host latency: 5.80642 ms (enqueue 2.04092 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66516 ms - Host latency: 5.81328 ms (enqueue 2.02783 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67158 ms - Host latency: 5.82029 ms (enqueue 2.02783 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6656 ms - Host latency: 5.81379 ms (enqueue 2.02937 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66917 ms - Host latency: 5.81758 ms (enqueue 2.01672 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67178 ms - Host latency: 5.82083 ms (enqueue 2.01523 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66956 ms - Host latency: 5.81763 ms (enqueue 2.02197 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67178 ms - Host latency: 5.82065 ms (enqueue 2.02053 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66321 ms - Host latency: 5.81099 ms (enqueue 2.02314 ms)
[09/12/2023-14:52:24] [I] 
[09/12/2023-14:52:24] [I] === Performance summary ===
[09/12/2023-14:52:24] [I] Throughput: 169.47 qps
[09/12/2023-14:52:24] [I] Latency: min = 5.39734 ms, max = 10.8278 ms, mean = 6.19844 ms, median = 5.82007 ms, percentile(90%) = 6.28113 ms, percentile(95%) = 10.5559 ms, percentile(99%) = 10.799 ms
[09/12/2023-14:52:24] [I] Enqueue Time: min = 1.76904 ms, max = 2.39758 ms, mean = 2.03898 ms, median = 2.02817 ms, percentile(90%) = 2.12695 ms, percentile(95%) = 2.17102 ms, percentile(99%) = 2.23059 ms
[09/12/2023-14:52:24] [I] H2D Latency: min = 0.00878906 ms, max = 0.029541 ms, mean = 0.011404 ms, median = 0.0100098 ms, percentile(90%) = 0.0152588 ms, percentile(95%) = 0.0185547 ms, percentile(99%) = 0.0227051 ms
[09/12/2023-14:52:24] [I] GPU Compute Time: min = 5.24982 ms, max = 8.27185 ms, mean = 5.88101 ms, median = 5.67188 ms, percentile(90%) = 6.1123 ms, percentile(95%) = 8.10596 ms, percentile(99%) = 8.19922 ms
[09/12/2023-14:52:24] [I] D2H Latency: min = 0.132324 ms, max = 2.62268 ms, mean = 0.306017 ms, median = 0.137726 ms, percentile(90%) = 0.146484 ms, percentile(95%) = 2.50769 ms, percentile(99%) = 2.61182 ms
[09/12/2023-14:52:24] [I] Total Host Walltime: 3.01528 s
[09/12/2023-14:52:24] [I] Total GPU Compute Time: 3.0052 s
[09/12/2023-14:52:24] [W] * GPU compute time is unstable, with coefficient of variance = 10.6942%.
[09/12/2023-14:52:24] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[09/12/2023-14:52:24] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/12/2023-14:52:24] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8601] # trtexec --loadEngine=./ResNet34_trackerOCR_36_450_20230627_half.engine

This seems normal and no problem.
BTW, the engine model is serialized in python and can be successfully deserialized in python function, but it fails when trying to deserialize in the in-class member function of the c++ code.

So I did another experiment in c++, i implemented the deserialization operation in a function and wrote it in a cpp file to compile and run. The result can be deserialized. like this:

#include <string>
#include <NvInfer.h>
#include <string>
#include <vector>
#include <fstream>
#include <iostream>

#define CHECK(call, resContent) check(call, __LINE__, __FILE__, resContent)

inline bool check(cudaError_t e, int iLine, const char *szFile, std::string& resContent) {
	if (e != cudaSuccess) {
		resContent = "CUDA runtime API error ";
		resContent += std::string(cudaGetErrorName(e));
		resContent += " at line " + std::to_string(iLine);
		resContent += " in file " + std::string(szFile);
		resContent += "\n";
		// std::cout << "CUDA runtime API error " << cudaGetErrorName(e) << " at line " << iLine << " in file " << szFile << std::endl;
        return false;
	}
	resContent = "";
	return true;
};

class TRTLogger: public nvinfer1::ILogger {
public:
	nvinfer1::ILogger::Severity reportableServerity;

public:
	TRTLogger(nvinfer1::ILogger::Severity severity = nvinfer1::ILogger::Severity::kVERBOSE): reportableServerity(severity) {
	}
	void log(nvinfer1::ILogger::Severity severity, const char* msg) noexcept override {
		if (severity > reportableServerity) {
			return;
		}
		switch (severity)
		{
		case nvinfer1::ILogger::Severity::kINTERNAL_ERROR:
			std::cout<<"INTERNAL_ERROR: " + std::string(msg)<<std::endl;
			break;

		case nvinfer1::ILogger::Severity::kERROR:
			std::cout<<"ERROR: " + std::string(msg)<<std::endl;
			break;

		case nvinfer1::ILogger::Severity::kWARNING:
			std::cout<<"WARNING: " + std::string(msg)<<std::endl;
			break;

		case nvinfer1::ILogger::Severity::kINFO:
			std::cout<<"INFO: " + std::string(msg)<<std::endl;
			break;
		
		default:
			std::cout<<"VERBOSE: " + std::string(msg)<<std::endl;
			break;
		}
	};
};
static TRTLogger s_Logger = TRTLogger();


int modelLoad(const std::string& m_modelPath) {
	std::string tmpLogStr;
	bool isSuccess = CHECK(cudaSetDevice(0), tmpLogStr);
	if (!isSuccess) {
		throw std::runtime_error("cuda set device in modelLoad unsuccessfully : " + tmpLogStr);
	}

	std::ifstream engineFile(m_modelPath, std::ios::binary);
	long int fsize = 0;
	// get file size
	std::cout<<"Parsing model file!"<<std::endl;
	engineFile.seekg(0, engineFile.end);
	fsize = engineFile.tellg();
	engineFile.seekg(0, engineFile.beg);

	std::vector<char> engineStr(fsize);
	engineFile.read(engineStr.data(), engineStr.size());

	if (engineStr.size() == 0) {
		std::cout<<"Failed getting serialized engine!"<<std::endl;
		engineFile.close();
		return -1;
	}
	engineFile.close();
	std::cout<<"Succeeded getting serialized engine!"<<std::endl;

	// create inference env, deserialize engine
	nvinfer1::IRuntime* m_runtime {nvinfer1::createInferRuntime(s_Logger)};
	nvinfer1::ICudaEngine* m_engine = m_runtime->deserializeCudaEngine(engineStr.data(), engineStr.size());
	if (m_engine == nullptr) {
		std::cout<<"Failed loading engine!"<<std::endl;
		return -1;
	}
	return 0;
}

int main() {
	std::string modelPath("../ResNet34_trackerOCR_36_450_20230627_half.engine");
	int retCode = modelLoad(modelPath);
}

After I did this experiment, it made me feel even more confused.
If you want to reproduce this problem, you can visit this link to get the corresponding code and model to reproduce it：
https://github.com/mucaoshen/test_tensorrt_cpp_load

Data-Adventure · 2023-09-18T06:42:24Z

Hey, I encountered the same problem. Have you solved it？

NVInfer: 1: [dispatchStubs.cpp::deserializeEngine::14] Error Code 1: Internal Error (Unexpected call to stub)

mucaoshen · 2023-09-19T06:57:26Z

Hey, I encountered the same problem. Have you solved it？
NVInfer: 1: [dispatchStubs.cpp::deserializeEngine::14] Error Code 1: Internal Error (Unexpected call to stub)

Not yet, it seems unsupport to deserialize engine file in the in-class member function when u split header file and source file. But if u declaration && definition in header file, it's worked. When u do this, maybe occur multiple definition error during dev and debugging.
Another weird thing is that when i build the model in the source file using builder and get the engine directly, this works for me. The only disadvantage is that the build time is very long. But when i build the model in the source file using builder and serialize the built engine, deserialization does not work. I guess this is a bug.

Data-Adventure · 2023-09-19T13:50:07Z

Hey, I encountered the same problem. Have you solved it？
NVInfer: 1: [dispatchStubs.cpp::deserializeEngine::14] Error Code 1: Internal Error (Unexpected call to stub)
Not yet, it seems unsupport to deserialize engine file in the in-class member function when u split header file and source file. But if u declaration && definition in header file, it's worked. When u do this, maybe occur multiple definition error during dev and debugging. Another weird thing is that when i build the model in the source file using builder and get the engine directly, this works for me. The only disadvantage is that the build time is very long. But when i build the model in the source file using builder and serialize the built engine, deserialization does not work. I guess this is a bug.

It seems to be a problem of static connection and dynamic connection. For example, when I try to add the source file in the input folder through cmakelists.txt add_ library() , shared may have the aforementioned bugs, but when it is adjusted to static, the compilation passes. Can deserialize the model normally and perform inference normally. I hope it will be helpful to you

mucaoshen · 2023-09-20T06:37:04Z

Hey, I encountered the same problem. Have you solved it？
NVInfer: 1: [dispatchStubs.cpp::deserializeEngine::14] Error Code 1: Internal Error (Unexpected call to stub)
Not yet, it seems unsupport to deserialize engine file in the in-class member function when u split header file and source file. But if u declaration && definition in header file, it's worked. When u do this, maybe occur multiple definition error during dev and debugging. Another weird thing is that when i build the model in the source file using builder and get the engine directly, this works for me. The only disadvantage is that the build time is very long. But when i build the model in the source file using builder and serialize the built engine, deserialization does not work. I guess this is a bug.
It seems to be a problem of static connection and dynamic connection. For example, when I try to add the source file in the input folder through **cmakelists.txt add library()**_ , shared may have the aforementioned bugs, but when it is adjusted to static, the compilation passes. Can deserialize the model normally and perform inference normally. I hope it will be helpful to you

Yes, u are right. When i change SHARED to STATIC in the add_ library function of CMakeLists.txt, it worked.

mucaoshen · 2023-09-20T07:14:38Z

We found a temporary solution to this problem, that is, by generating a static library. If someone like us wants to implement deserialization in the in-class member function in the source file, you can refer to this solution.

Thanks to @Data-Adventure for the solution. I have updated the relevant example links.

Hope the official can solve the dynamic link bug in the next version.

@zerollzeng There is currently a temporary solution to the problem, I close the issue

bveldhoen · 2023-12-04T10:34:18Z

FWIW, I was getting a similar error message

1: [leanStubs.cpp::loadRunner::128] Error Code 1: Internal Error (Unexpected call to stub)

when linking our shared library against nvinfer_lean. Changing this to link against nvinfer solved this issue. The engine deserialization worked also when using a shared library.

raghu1kalluri · 2025-01-26T12:37:04Z

FWIW, I was getting a similar error message

1: [leanStubs.cpp::loadRunner::128] Error Code 1: Internal Error (Unexpected call to stub)

when linking our shared library against nvinfer_lean. Changing this to link against nvinfer solved this issue. The engine deserialization worked also when using a shared library.

Seconded- using libnvinfer.so instead of libnvinfer_lean.so got rid of this error. Works with shared libraries.

zerollzeng self-assigned this Sep 11, 2023

zerollzeng added the triaged Issue has been triaged by maintainers label Sep 11, 2023

mucaoshen closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deseriasized model failure of TensorRT 8.6.1.6 when running in C++ code on GPU v100 #3307

deseriasized model failure of TensorRT 8.6.1.6 when running in C++ code on GPU v100 #3307

mucaoshen commented Sep 11, 2023 •

edited

Loading

zerollzeng commented Sep 11, 2023

mucaoshen commented Sep 12, 2023

Data-Adventure commented Sep 18, 2023

mucaoshen commented Sep 19, 2023 •

edited

Loading

Data-Adventure commented Sep 19, 2023

mucaoshen commented Sep 20, 2023 •

edited

Loading

mucaoshen commented Sep 20, 2023 •

edited

Loading

bveldhoen commented Dec 4, 2023 •

edited

Loading

raghu1kalluri commented Jan 26, 2025

deseriasized model failure of TensorRT 8.6.1.6 when running in C++ code on GPU v100 #3307

deseriasized model failure of TensorRT 8.6.1.6 when running in C++ code on GPU v100 #3307

Comments

mucaoshen commented Sep 11, 2023 • edited Loading

Description

Environment

Relevant Files

Steps To Reproduce

zerollzeng commented Sep 11, 2023

mucaoshen commented Sep 12, 2023

Data-Adventure commented Sep 18, 2023

mucaoshen commented Sep 19, 2023 • edited Loading

Data-Adventure commented Sep 19, 2023

mucaoshen commented Sep 20, 2023 • edited Loading

mucaoshen commented Sep 20, 2023 • edited Loading

bveldhoen commented Dec 4, 2023 • edited Loading

raghu1kalluri commented Jan 26, 2025

mucaoshen commented Sep 11, 2023 •

edited

Loading

mucaoshen commented Sep 19, 2023 •

edited

Loading

mucaoshen commented Sep 20, 2023 •

edited

Loading

mucaoshen commented Sep 20, 2023 •

edited

Loading

bveldhoen commented Dec 4, 2023 •

edited

Loading