-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deseriasized model failure of TensorRT 8.6.1.6 when running in C++ code on GPU v100 #3307
Comments
Can you try trtexec first? I want to to know whether this is a bug or not. You can use our official container and run the command link |
First of all, thank your reply! (pytorch1.12) $ trtexec --loadEngine=./ResNet34_trackerOCR_36_450_20230627_half.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --loadEngine=./ResNet34_trackerOCR_36_450_20230627_half.engine
[09/12/2023-14:52:20] [I] === Model Options ===
[09/12/2023-14:52:20] [I] Format: *
[09/12/2023-14:52:20] [I] Model:
[09/12/2023-14:52:20] [I] Output:
[09/12/2023-14:52:20] [I] === Build Options ===
[09/12/2023-14:52:20] [I] Max batch: 1
[09/12/2023-14:52:20] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[09/12/2023-14:52:20] [I] minTiming: 1
[09/12/2023-14:52:20] [I] avgTiming: 8
[09/12/2023-14:52:20] [I] Precision: FP32
[09/12/2023-14:52:20] [I] LayerPrecisions:
[09/12/2023-14:52:20] [I] Layer Device Types:
[09/12/2023-14:52:20] [I] Calibration:
[09/12/2023-14:52:20] [I] Refit: Disabled
[09/12/2023-14:52:20] [I] Version Compatible: Disabled
[09/12/2023-14:52:20] [I] TensorRT runtime: full
[09/12/2023-14:52:20] [I] Lean DLL Path:
[09/12/2023-14:52:20] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[09/12/2023-14:52:20] [I] Exclude Lean Runtime: Disabled
[09/12/2023-14:52:20] [I] Sparsity: Disabled
[09/12/2023-14:52:20] [I] Safe mode: Disabled
[09/12/2023-14:52:20] [I] Build DLA standalone loadable: Disabled
[09/12/2023-14:52:20] [I] Allow GPU fallback for DLA: Disabled
[09/12/2023-14:52:20] [I] DirectIO mode: Disabled
[09/12/2023-14:52:20] [I] Restricted mode: Disabled
[09/12/2023-14:52:20] [I] Skip inference: Disabled
[09/12/2023-14:52:20] [I] Save engine:
[09/12/2023-14:52:20] [I] Load engine: ./ResNet34_trackerOCR_36_450_20230627_half.engine
[09/12/2023-14:52:20] [I] Profiling verbosity: 0
[09/12/2023-14:52:20] [I] Tactic sources: Using default tactic sources
[09/12/2023-14:52:20] [I] timingCacheMode: local
[09/12/2023-14:52:20] [I] timingCacheFile:
[09/12/2023-14:52:20] [I] Heuristic: Disabled
[09/12/2023-14:52:20] [I] Preview Features: Use default preview flags.
[09/12/2023-14:52:20] [I] MaxAuxStreams: -1
[09/12/2023-14:52:20] [I] BuilderOptimizationLevel: -1
[09/12/2023-14:52:20] [I] Input(s)s format: fp32:CHW
[09/12/2023-14:52:20] [I] Output(s)s format: fp32:CHW
[09/12/2023-14:52:20] [I] Input build shapes: model
[09/12/2023-14:52:20] [I] Input calibration shapes: model
[09/12/2023-14:52:20] [I] === System Options ===
[09/12/2023-14:52:20] [I] Device: 0
[09/12/2023-14:52:20] [I] DLACore:
[09/12/2023-14:52:20] [I] Plugins:
[09/12/2023-14:52:20] [I] setPluginsToSerialize:
[09/12/2023-14:52:20] [I] dynamicPlugins:
[09/12/2023-14:52:20] [I] ignoreParsedPluginLibs: 0
[09/12/2023-14:52:20] [I]
[09/12/2023-14:52:20] [I] === Inference Options ===
[09/12/2023-14:52:20] [I] Batch: 1
[09/12/2023-14:52:20] [I] Input inference shapes: model
[09/12/2023-14:52:20] [I] Iterations: 10
[09/12/2023-14:52:20] [I] Duration: 3s (+ 200ms warm up)
[09/12/2023-14:52:20] [I] Sleep time: 0ms
[09/12/2023-14:52:20] [I] Idle time: 0ms
[09/12/2023-14:52:20] [I] Inference Streams: 1
[09/12/2023-14:52:20] [I] ExposeDMA: Disabled
[09/12/2023-14:52:20] [I] Data transfers: Enabled
[09/12/2023-14:52:20] [I] Spin-wait: Disabled
[09/12/2023-14:52:20] [I] Multithreading: Disabled
[09/12/2023-14:52:20] [I] CUDA Graph: Disabled
[09/12/2023-14:52:20] [I] Separate profiling: Disabled
[09/12/2023-14:52:20] [I] Time Deserialize: Disabled
[09/12/2023-14:52:20] [I] Time Refit: Disabled
[09/12/2023-14:52:20] [I] NVTX verbosity: 0
[09/12/2023-14:52:20] [I] Persistent Cache Ratio: 0
[09/12/2023-14:52:20] [I] Inputs:
[09/12/2023-14:52:20] [I] === Reporting Options ===
[09/12/2023-14:52:20] [I] Verbose: Disabled
[09/12/2023-14:52:20] [I] Averages: 10 inferences
[09/12/2023-14:52:20] [I] Percentiles: 90,95,99
[09/12/2023-14:52:20] [I] Dump refittable layers:Disabled
[09/12/2023-14:52:20] [I] Dump output: Disabled
[09/12/2023-14:52:20] [I] Profile: Disabled
[09/12/2023-14:52:20] [I] Export timing to JSON file:
[09/12/2023-14:52:20] [I] Export output to JSON file:
[09/12/2023-14:52:20] [I] Export profile to JSON file:
[09/12/2023-14:52:20] [I]
[09/12/2023-14:52:21] [I] === Device Information ===
[09/12/2023-14:52:21] [I] Selected Device: Tesla V100-SXM2-32GB
[09/12/2023-14:52:21] [I] Compute Capability: 7.0
[09/12/2023-14:52:21] [I] SMs: 80
[09/12/2023-14:52:21] [I] Device Global Memory: 32510 MiB
[09/12/2023-14:52:21] [I] Shared Memory per SM: 96 KiB
[09/12/2023-14:52:21] [I] Memory Bus Width: 4096 bits (ECC enabled)
[09/12/2023-14:52:21] [I] Application Compute Clock Rate: 1.53 GHz
[09/12/2023-14:52:21] [I] Application Memory Clock Rate: 0.877 GHz
[09/12/2023-14:52:21] [I]
[09/12/2023-14:52:21] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[09/12/2023-14:52:21] [I]
[09/12/2023-14:52:21] [I] TensorRT version: 8.6.1
[09/12/2023-14:52:21] [I] Loading standard plugins
[09/12/2023-14:52:21] [I] Engine loaded in 0.099107 sec.
[09/12/2023-14:52:21] [I] [TRT] Loaded engine size: 47 MiB
[09/12/2023-14:52:21] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +46, now: CPU 0, GPU 46 (MiB)
[09/12/2023-14:52:21] [I] Engine deserialized in 0.482258 sec.
[09/12/2023-14:52:21] [I] [TRT] [MS] Running engine with multi stream info
[09/12/2023-14:52:21] [I] [TRT] [MS] Number of aux streams is 1
[09/12/2023-14:52:21] [I] [TRT] [MS] Number of total worker streams is 2
[09/12/2023-14:52:21] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[09/12/2023-14:52:21] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +38, now: CPU 0, GPU 84 (MiB)
[09/12/2023-14:52:21] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[09/12/2023-14:52:21] [I] Setting persistentCacheLimit to 0 bytes.
[09/12/2023-14:52:21] [W] Shape missing for input with dynamic shape: imagesAutomatically setting shape to: 1x1x36x450
[09/12/2023-14:52:21] [I] Using random values for input images
[09/12/2023-14:52:21] [I] Input binding for images with dimensions 1x1x36x450 is created.
[09/12/2023-14:52:21] [I] Output binding for output0 with dimensions 1x57x14665 is created.
[09/12/2023-14:52:21] [I] Starting inference
[09/12/2023-14:52:24] [I] Warmup completed 32 queries over 200 ms
[09/12/2023-14:52:24] [I] Timing trace has 511 queries over 3.01528 s
[09/12/2023-14:52:24] [I]
[09/12/2023-14:52:24] [I] === Trace details ===
[09/12/2023-14:52:24] [I] Trace averages of 10 runs:
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.95722 ms - Host latency: 6.10713 ms (enqueue 2.12868 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.95046 ms - Host latency: 6.10154 ms (enqueue 2.16006 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.83135 ms - Host latency: 5.98027 ms (enqueue 2.0971 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6658 ms - Host latency: 5.81426 ms (enqueue 2.02772 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67111 ms - Host latency: 5.81943 ms (enqueue 2.02517 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66824 ms - Host latency: 5.81615 ms (enqueue 2.02475 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.65996 ms - Host latency: 5.80783 ms (enqueue 2.02708 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66303 ms - Host latency: 5.81025 ms (enqueue 2.0019 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66323 ms - Host latency: 5.81227 ms (enqueue 2.02696 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66212 ms - Host latency: 5.81015 ms (enqueue 2.03702 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6668 ms - Host latency: 5.81578 ms (enqueue 2.03282 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66476 ms - Host latency: 5.81346 ms (enqueue 2.02709 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.65964 ms - Host latency: 5.8077 ms (enqueue 2.02647 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67133 ms - Host latency: 5.8189 ms (enqueue 2.02748 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66866 ms - Host latency: 5.81728 ms (enqueue 2.02609 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67296 ms - Host latency: 5.82157 ms (enqueue 2.02437 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66282 ms - Host latency: 5.81222 ms (enqueue 2.00498 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6653 ms - Host latency: 5.8134 ms (enqueue 2.02383 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6611 ms - Host latency: 5.80996 ms (enqueue 2.02515 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66427 ms - Host latency: 5.81266 ms (enqueue 2.02683 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66947 ms - Host latency: 5.81747 ms (enqueue 2.03262 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.68475 ms - Host latency: 5.83412 ms (enqueue 1.97859 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 6.09576 ms - Host latency: 6.25585 ms (enqueue 2.13538 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 6.02061 ms - Host latency: 6.18273 ms (enqueue 2.13502 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66566 ms - Host latency: 5.81377 ms (enqueue 2.0287 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.88198 ms - Host latency: 6.04028 ms (enqueue 2.07136 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 7.34084 ms - Host latency: 9.14175 ms (enqueue 2.12656 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 7.74329 ms - Host latency: 9.83772 ms (enqueue 2.04458 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 7.92822 ms - Host latency: 10.1846 ms (enqueue 2.10048 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 8.15381 ms - Host latency: 10.7014 ms (enqueue 2.00776 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 6.34496 ms - Host latency: 6.98275 ms (enqueue 2.00746 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66548 ms - Host latency: 5.81328 ms (enqueue 2.02808 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67654 ms - Host latency: 5.82566 ms (enqueue 2.02197 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6687 ms - Host latency: 5.81763 ms (enqueue 2.01785 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67539 ms - Host latency: 5.82456 ms (enqueue 2.0335 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67039 ms - Host latency: 5.81973 ms (enqueue 2.02637 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66816 ms - Host latency: 5.81726 ms (enqueue 2.02468 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67349 ms - Host latency: 5.82261 ms (enqueue 2.02991 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.65906 ms - Host latency: 5.80603 ms (enqueue 2.02947 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66165 ms - Host latency: 5.80857 ms (enqueue 2.02393 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.68105 ms - Host latency: 5.83049 ms (enqueue 2.02991 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67217 ms - Host latency: 5.82126 ms (enqueue 2.02915 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.65769 ms - Host latency: 5.80642 ms (enqueue 2.04092 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66516 ms - Host latency: 5.81328 ms (enqueue 2.02783 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67158 ms - Host latency: 5.82029 ms (enqueue 2.02783 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.6656 ms - Host latency: 5.81379 ms (enqueue 2.02937 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66917 ms - Host latency: 5.81758 ms (enqueue 2.01672 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67178 ms - Host latency: 5.82083 ms (enqueue 2.01523 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66956 ms - Host latency: 5.81763 ms (enqueue 2.02197 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.67178 ms - Host latency: 5.82065 ms (enqueue 2.02053 ms)
[09/12/2023-14:52:24] [I] Average on 10 runs - GPU latency: 5.66321 ms - Host latency: 5.81099 ms (enqueue 2.02314 ms)
[09/12/2023-14:52:24] [I]
[09/12/2023-14:52:24] [I] === Performance summary ===
[09/12/2023-14:52:24] [I] Throughput: 169.47 qps
[09/12/2023-14:52:24] [I] Latency: min = 5.39734 ms, max = 10.8278 ms, mean = 6.19844 ms, median = 5.82007 ms, percentile(90%) = 6.28113 ms, percentile(95%) = 10.5559 ms, percentile(99%) = 10.799 ms
[09/12/2023-14:52:24] [I] Enqueue Time: min = 1.76904 ms, max = 2.39758 ms, mean = 2.03898 ms, median = 2.02817 ms, percentile(90%) = 2.12695 ms, percentile(95%) = 2.17102 ms, percentile(99%) = 2.23059 ms
[09/12/2023-14:52:24] [I] H2D Latency: min = 0.00878906 ms, max = 0.029541 ms, mean = 0.011404 ms, median = 0.0100098 ms, percentile(90%) = 0.0152588 ms, percentile(95%) = 0.0185547 ms, percentile(99%) = 0.0227051 ms
[09/12/2023-14:52:24] [I] GPU Compute Time: min = 5.24982 ms, max = 8.27185 ms, mean = 5.88101 ms, median = 5.67188 ms, percentile(90%) = 6.1123 ms, percentile(95%) = 8.10596 ms, percentile(99%) = 8.19922 ms
[09/12/2023-14:52:24] [I] D2H Latency: min = 0.132324 ms, max = 2.62268 ms, mean = 0.306017 ms, median = 0.137726 ms, percentile(90%) = 0.146484 ms, percentile(95%) = 2.50769 ms, percentile(99%) = 2.61182 ms
[09/12/2023-14:52:24] [I] Total Host Walltime: 3.01528 s
[09/12/2023-14:52:24] [I] Total GPU Compute Time: 3.0052 s
[09/12/2023-14:52:24] [W] * GPU compute time is unstable, with coefficient of variance = 10.6942%.
[09/12/2023-14:52:24] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[09/12/2023-14:52:24] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/12/2023-14:52:24] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8601] # trtexec --loadEngine=./ResNet34_trackerOCR_36_450_20230627_half.engine This seems normal and no problem. So I did another experiment in c++, i implemented the deserialization operation in a function and wrote it in a cpp file to compile and run. The result can be deserialized. like this: #include <string>
#include <NvInfer.h>
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#define CHECK(call, resContent) check(call, __LINE__, __FILE__, resContent)
inline bool check(cudaError_t e, int iLine, const char *szFile, std::string& resContent) {
if (e != cudaSuccess) {
resContent = "CUDA runtime API error ";
resContent += std::string(cudaGetErrorName(e));
resContent += " at line " + std::to_string(iLine);
resContent += " in file " + std::string(szFile);
resContent += "\n";
// std::cout << "CUDA runtime API error " << cudaGetErrorName(e) << " at line " << iLine << " in file " << szFile << std::endl;
return false;
}
resContent = "";
return true;
};
class TRTLogger: public nvinfer1::ILogger {
public:
nvinfer1::ILogger::Severity reportableServerity;
public:
TRTLogger(nvinfer1::ILogger::Severity severity = nvinfer1::ILogger::Severity::kVERBOSE): reportableServerity(severity) {
}
void log(nvinfer1::ILogger::Severity severity, const char* msg) noexcept override {
if (severity > reportableServerity) {
return;
}
switch (severity)
{
case nvinfer1::ILogger::Severity::kINTERNAL_ERROR:
std::cout<<"INTERNAL_ERROR: " + std::string(msg)<<std::endl;
break;
case nvinfer1::ILogger::Severity::kERROR:
std::cout<<"ERROR: " + std::string(msg)<<std::endl;
break;
case nvinfer1::ILogger::Severity::kWARNING:
std::cout<<"WARNING: " + std::string(msg)<<std::endl;
break;
case nvinfer1::ILogger::Severity::kINFO:
std::cout<<"INFO: " + std::string(msg)<<std::endl;
break;
default:
std::cout<<"VERBOSE: " + std::string(msg)<<std::endl;
break;
}
};
};
static TRTLogger s_Logger = TRTLogger();
int modelLoad(const std::string& m_modelPath) {
std::string tmpLogStr;
bool isSuccess = CHECK(cudaSetDevice(0), tmpLogStr);
if (!isSuccess) {
throw std::runtime_error("cuda set device in modelLoad unsuccessfully : " + tmpLogStr);
}
std::ifstream engineFile(m_modelPath, std::ios::binary);
long int fsize = 0;
// get file size
std::cout<<"Parsing model file!"<<std::endl;
engineFile.seekg(0, engineFile.end);
fsize = engineFile.tellg();
engineFile.seekg(0, engineFile.beg);
std::vector<char> engineStr(fsize);
engineFile.read(engineStr.data(), engineStr.size());
if (engineStr.size() == 0) {
std::cout<<"Failed getting serialized engine!"<<std::endl;
engineFile.close();
return -1;
}
engineFile.close();
std::cout<<"Succeeded getting serialized engine!"<<std::endl;
// create inference env, deserialize engine
nvinfer1::IRuntime* m_runtime {nvinfer1::createInferRuntime(s_Logger)};
nvinfer1::ICudaEngine* m_engine = m_runtime->deserializeCudaEngine(engineStr.data(), engineStr.size());
if (m_engine == nullptr) {
std::cout<<"Failed loading engine!"<<std::endl;
return -1;
}
return 0;
}
int main() {
std::string modelPath("../ResNet34_trackerOCR_36_450_20230627_half.engine");
int retCode = modelLoad(modelPath);
} After I did this experiment, it made me feel even more confused. |
Hey, I encountered the same problem. Have you solved it?
|
Not yet, it seems unsupport to deserialize engine file in the in-class member function when u split header file and source file. But if u declaration && definition in header file, it's worked. When u do this, maybe occur multiple definition error during dev and debugging. |
It seems to be a problem of static connection and dynamic connection. For example, when I try to add the source file in the input folder through cmakelists.txt add_ library() , shared may have the aforementioned bugs, but when it is adjusted to static, the compilation passes. Can deserialize the model normally and perform inference normally. I hope it will be helpful to you |
Yes, u are right. When i change |
We found a temporary solution to this problem, that is, by generating a static library. If someone like us wants to implement deserialization in the in-class member function in the source file, you can refer to this solution. Thanks to @Data-Adventure for the solution. I have updated the relevant example links. Hope the official can solve the dynamic link bug in the next version. @zerollzeng There is currently a temporary solution to the problem, I close the issue |
FWIW, I was getting a similar error message
when linking our shared library against nvinfer_lean. Changing this to link against nvinfer solved this issue. The engine deserialization worked also when using a shared library. |
Seconded- using |
Description
I tried to deserialized model in C++ code like this:
In
test.h
In
test.cpp
and in
main.cpp
when i tried to run the code with model engine on GPU v100, and get an error like this log:
I have tried writing the contents of
test.cpp
andtest.h
inmain.cpp
. At this time, the deserialization is unsuccessful.This confuses me, I don't know what I'm doing wrong.
In addition, I also did corresponding tests in the python program and it was also normal.
Environment
TensorRT Version: 8.6.1.6
NVIDIA GPU: Tesla V100
NVIDIA Driver Version: 515.43.04
CUDA Version: 11.7.99
CUDNN Version: 8.9.2
Operating System: ubuntu16.04
Python Version (if applicable): 3.8
Tensorflow Version (if applicable): no use
PyTorch Version (if applicable): 1.13.1
Baremetal or Container (if so, version): no use
Relevant Files
For related codes and models, please refer to this link.
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?: Yes
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
): No. I think the onnx model is okThe text was updated successfully, but these errors were encountered: