Replies: 1 comment
-
The GPU performance increase has been even greater, like 5 times faster or more. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Only CPU - i9 9900
I made some tests with
2 months old llama build and 65B ggml model also 2 months old .
2 month old 65B q5_1 model and old llama.cpp - around 1700 ms/token
Current 65B q4k_m ( similar prepexity to q5_1) and current llama.cpp - around 1000 ms/token.
And also tested my new ryzen 7950x3d :P
Current 65B q4k_m ( similar prepexity to q5_1) and current llama.cpp - around 600 ms/token. ( Didn't test for avx512 yet )
So combination new models and new builds for CPU comparing from 2 months ago is giving more than 60% performance improvement.
That's awesome!
Beta Was this translation helpful? Give feedback.
All reactions