Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Tested and restored some functionality of llamafiler; Q8_0 on ARM; fixed memory leaks; copied latest upstream quantization code; migrate to newer llama.cpp APIs; etc. llamafiler is now able to serve 1800 embeddings per second; that's 6.81x faster than the llama.cpp/examples/server/ upstream
- Loading branch information