Which one is faster for token generation? LLama.cpp v.s. SGLang #3463

ghostplant · 2025-02-10T06:52:00Z

ghostplant
Feb 10, 2025

Is there any benchmark comparison based on the same batch size (= 1 for token generation) and same quant.ed type?

zhyncs · 2025-02-10T06:55:16Z

zhyncs
Feb 10, 2025
Maintainer

For batch size 1 use case, you can enable torch compile in SGLang. It's even faster than gpt-fast. So I believe SGLang is faster. Welcome to give us feedback and benchmark results in your use case.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which one is faster for token generation? LLama.cpp v.s. SGLang #3463

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Which one is faster for token generation? LLama.cpp v.s. SGLang #3463

ghostplant Feb 10, 2025

Replies: 1 comment

zhyncs Feb 10, 2025 Maintainer

ghostplant
Feb 10, 2025

zhyncs
Feb 10, 2025
Maintainer