This is a C++ serving library for modern C++, which provide a simple interface for working with neural network models. Right now it is having those features:
- Nvidia TensorRT
- fmt (until C++20)
- C++17
Some inference backends (i.e. TensorRT) requires a custom logger. To provide a robust logging system, this library request user to provide their definition for flushLog function in ILogger inferface.
void Logger::flushLog(Level level, std::string_view message) const {
switch (level) {
case Level::kINTERNAL_ERROR:
ROS_FATAL_STREAM_NAMED(name_, message);
break;
...
case Level::kVERBOSE:
ROS_DEBUG_STREAM_NAMED(name_, message);
break;
}
}
- TensorRT: .engine, .trt
- Populate
BuildOptions
andSystemOptions
in option.hpp - Create a
Generator
instance with those options, then usegenerator.getSerializedEngine
to allocate memory for the network. Right now it could only take ONNX model path; using TensorRT's layers is under development. This function return raw pointer, remember to delete it after saving the model (or use smart pointer) - Use
saveEngine
function in utils.hpp to save allocated model to file.
// Suppose that we use the Logger above
auto logger = std::make_shared<Logger>();
// Use default options
Generator generator{BuildOptions(), SystemOptions(), logger};
std::unique_ptr<nvinfer1::IHostMemory> serialized_engine{generator.getSerializedEngine("path_to_onnx")};
saveEngine(*serialized_engine, "path_to_save_engine");
- Create a
Session
instance by usingInferenceOptions
. This class will automatically choose the right backend using model's file extension - Call
session.doInference
to perform synchronous inference. This function receives a map, which its key is input layer's name and value are host's pointer + size in bytes. The output results in also a map of (layer's name, buffer in host)
auto logger = std::make_shared<Logger>();
InferenceOptions options;
options.model_path = "path_to_model";
session = Session(options, logger);
// mat is cv::Mat image, 1 input layer named "input"
auto outputs = session_->doInference({{"input", {mat.data, mat.total() * mat.elemSize()}}});
std::vector<uint8_t> &out_tensor = outputs["output"];
// out_tensor is a buffer, please cast it to expected output type
- Provide implementation for
IBackend
interface - Register the new backend and its file extensions in .cpp file, see
bool registered
in backend.cpp for example.