Integration with Triton Inference Server #311
WissamAntoun
started this conversation in
Ideas
Replies: 1 comment
-
We have not yet tested it. We are wondering if PyServe could also be a good choice. We really like triton server but it has some limitations regarding the Python engine and we are wondering if it's a goods choice as obviously we won't leverage all the engines etc. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
it would really be interesting to see if
kernl
works well running inside the Triton Inference Server with the python backend. I'm not sure if you tested this way of deploying the model, since it would fit well withtransformer-deploy
.So far i did a quick test, which required changing the
setup.py
to support python 3.8, which also revealed few issues with thelist
andtuple
type hints in https://github.com/ELS-RD/kernl/blob/main/src/kernl/optimizer/cuda_graph.py#L29 for example. After i fixed those issues, i ran intoCUDA error: operation not permitted when stream is capturing
.I was using a T5 model running inside huggingface text generation pipelines and
tritonserver:22.10
This is the
model.py
file i was usingmodel.py
Error logs
Beta Was this translation helpful? Give feedback.
All reactions