You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I evaluated INT8 quanto. The accuracy is good (on Llama3.1-8B). But I currently do not have the bandwidth to implement this on the CPU (actually, I do not know how to do this except by calling quanto, which is not very fast on the CPU).
Hi, have you tested the performance on quantized KV Cache? Is it possible to keep a resonable high performance under int8 quantization?
The text was updated successfully, but these errors were encountered: