Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4743
common : add llama.vim preset for Qwen2.5 Coder (#11945) This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder models. The motivation for this change is to make it easier to start a server suitable to be used with the llama.vim plugin. For example, the server can be started with a command like the following: ```console $ llama.vim --fim-qwen-1.5b-default ``` Refs: https://github.com/ggml-org/llama.cpp/issues/10932
b4742
speculative : update default params (#11954) * speculative : update default params * speculative : do not discard the last drafted token
b4739
tool-call: refactor common chat / tool-call api (+ tests / fixes) (#1…
b4738
server : add TEI API format for /rerank endpoint (#11942) * server : add TEI API format for /rerank endpoint * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix * also gitignore examples/server/*.gz.hpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b4735
CUDA: use async data loading for FlashAttention (#11894) * CUDA: use async data loading for FlashAttention --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>
b4734
update release requirements (#11897)
b4733
server : fix divide-by-zero in metrics reporting (#11915)
b4732
vulkan: implement several ops relevant for ggml_opt (#11769) * vulkan: support memset_tensor * vulkan: support GGML_OP_SUM * vulkan: implement GGML_OP_ARGMAX * vulkan: implement GGML_OP_SUB * vulkan: implement GGML_OP_COUNT_EQUAL * vulkan: implement GGML_OP_OPT_STEP_ADAMW * vulkan: fix check_results RWKV_WKV6 crash and memory leaks * vulkan: implement GGML_OP_REPEAT_BACK * tests: remove invalid test-backend-ops REPEAT_BACK tests * vulkan: fix COUNT_EQUAL memset using a fillBuffer command
b4731
server : bump httplib to 0.19.0 (#11908)
b4730
common : Fix a typo in help (#11899) This patch fixes a typo in command help. prefx -> prefix Signed-off-by: Masanari Iida <standby24x7@gmail.com>