You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have a CPU-only environment and to my understanding neither TRT-LLM nor vLLM would work on that (both requires GPU).
Therefore I deployed a tinyllama-1b model using onnxruntime and now wondering if there is any information on how to do the inferencing there based on the inputs?
Any information/similar work is much appreciated :)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, I have a CPU-only environment and to my understanding neither TRT-LLM nor vLLM would work on that (both requires GPU).
Therefore I deployed a tinyllama-1b model using onnxruntime and now wondering if there is any information on how to do the inferencing there based on the inputs?
Any information/similar work is much appreciated :)
Logs of the container:
Beta Was this translation helpful? Give feedback.
All reactions