git clone https://github.com/ggml-org/llama.cpp.git
mv ./llama.cpp ./llama
- Download latest llama.cpp bin: https://github.com/ggml-org/llama.cpp/releases
- move it into llama folder (it will like this:
./llama/bin/<bin files>
- clone model's HF repo in AutoGGUF folder.
- Run
main.py