You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Triton provides an extension to the standard gRPC inference api for streaming (inference.GRPCInferenceService/ModelStreamInfer), this extension is required to use vLLM backend with triton.
However currently the triton runtime adapter does not advertise the existence of this gRPC method and trying to call it results in an error (inference.GRPCInferenceService/ModelStreamInfer: UNIMPLEMENTED: Method not found or not permitted: inference.GRPCInferenceService/ModelStreamInfer)
To resolve this issue, I think the ModelStreamInfer method must be added here:
Triton provides an extension to the standard gRPC inference api for streaming (
inference.GRPCInferenceService/ModelStreamInfer
), this extension is required to use vLLM backend with triton.However currently the triton runtime adapter does not advertise the existence of this gRPC method and trying to call it results in an error (
inference.GRPCInferenceService/ModelStreamInfer: UNIMPLEMENTED: Method not found or not permitted: inference.GRPCInferenceService/ModelStreamInfer
)To resolve this issue, I think the ModelStreamInfer method must be added here:
modelmesh-runtime-adapter/model-mesh-triton-adapter/server/server.go
Lines 267 to 269 in f9781d2
The text was updated successfully, but these errors were encountered: