Which one is faster for token generation? LLama.cpp v.s. SGLang #3463
Unanswered
ghostplant
asked this question in
Q&A
Replies: 1 comment
-
For batch size 1 use case, you can enable torch compile in SGLang. It's even faster than gpt-fast. So I believe SGLang is faster. Welcome to give us feedback and benchmark results in your use case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is there any benchmark comparison based on the same batch size (= 1 for token generation) and same quant.ed type?
Beta Was this translation helpful? Give feedback.
All reactions