Improvement of llama.cpp and ggml models within 2 months . #2033

mirek190 · 2023-06-28T09:39:28Z

mirek190
Jun 28, 2023

Only CPU - i9 9900
I made some tests with
2 months old llama build and 65B ggml model also 2 months old .

2 month old 65B q5_1 model and old llama.cpp - around 1700 ms/token

Current 65B q4k_m ( similar prepexity to q5_1) and current llama.cpp - around 1000 ms/token.

And also tested my new ryzen 7950x3d :P
Current 65B q4k_m ( similar prepexity to q5_1) and current llama.cpp - around 600 ms/token. ( Didn't test for avx512 yet )

So combination new models and new builds for CPU comparing from 2 months ago is giving more than 60% performance improvement.

That's awesome!

SlyEcho · 2023-06-29T11:35:58Z

SlyEcho
Jun 29, 2023
Collaborator Sponsor

The GPU performance increase has been even greater, like 5 times faster or more.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement of llama.cpp and ggml models within 2 months . #2033

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Improvement of llama.cpp and ggml models within 2 months . #2033

mirek190 Jun 28, 2023

Replies: 1 comment

SlyEcho Jun 29, 2023 Collaborator Sponsor

mirek190
Jun 28, 2023

SlyEcho
Jun 29, 2023
Collaborator Sponsor