Replies: 1 comment
-
Hi @nybbles, thanks for such a detailed discussion post!
This is one of the major selling points of the Triton Inference Server historically, multi-backend with optimized C++ implementations through a single interface via python/REST/GRPC. If you're looking to do things in-process with your python workloads and avoid the network overhead, then it sounds like you're on the right track looking into the new Python in-process API. We'd definitely be interested to hear any early feedback or thoughts you may have from trying it out.
CC @nnshah1 who may be able to help here.
For now, you'd probably need to implement a simple dummy/mock module that could be interacted with in the same way. The |
Beta Was this translation helpful? Give feedback.
-
We are working on streamlining our ML infrastructure (which runs on an IPC in a in-line manufacturing device). We do not have a standardized ML inference API for our models and supporting code. Our models are mostly Python-based and eventually call into sklearn or PyTorch).
For now, we are considering adopting the
TritonPythonModel
API as an internal standard, for the benefits of a standard API, including being able to easily adopt Triton in the future.The
execute
method inTritonPythonModel
expectspb_utils.InferenceRequest
andpb_utils.InferenceResponse
, and there is related utility code fromtriton_python_backend_utils
. We would need to use this utility code to constructpb_utils.InferenceRequest
s from our own internal abstractions and then translate thepb_utils.InferenceResponse
s back to our own internal abstractions.triton_python_backend_utils
is only available from within Triton inference server itself, or fromTriton_Inference_Server_Python_API/deps/tritonserver-2.41.0.dev0-py3-none-any.whl
, which I saw is built by the Docker build process for thetritonserver
Docker image.For now, we want to continue running our model inference in-process, and just want to adopt a standard API, like
TritonPythonModel
's for our models.Here are my questions:
pb_utils.InferenceRequest
and related abstractions that you'd recommend?Also, if this approach is flawed in some way that I'm missing, I would love to be alerted about that. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions