Releases: zhouwg/ggml-hexagon
Releases · zhouwg/ggml-hexagon
b5162
convert : experimental support for `--mmproj` flag (#13023) * convert : experimental support for `--mmproj` flag * fix bad ctrl+f replace * fix style * split into subclasses TextModel and VisionModel * rename Mode --> ModelBase * small fix * correct CLIP_VISION arch name (because existing GGUF already use it) * Apply suggestions from code review Co-authored-by: compilade <git@compilade.net> * fix Mistral3Model * fix typo Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>
b5158
Disable CI cross-compile builds (#13022)
b5155
main : Fix Ctrl+D/newline handling (#12951) This restores the behavior from #491. This does not affect Ctrl+D's ability to terminate --multiline-input lines (#1040). This also actually implements #587: "If the user wants the text to end in a newline, this should be accomplished by explicitly adding a newline by using \ followed by return, then returning control by pressing return again." Fixes #12949
b5152
SYCL: Refactor and enable FP16 in binary broadcast OPs (#12975) * SYCL: refactor move to a separate file * Fix binbcast * Remove duplicates * fix include formatting * fix typo
b5149
graph : make FA compatible with MLA + add initial Metal kernels (#12953) * graph : make mla compatible with FA * metal : add exp FA kernels for DeepSeek models ggml-ci * llama : minor naming updates ggml-ci * ggml : disable FA for DS head sizes * tests : add FA tests for MLA shapes ggml-ci
b5148
ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (#12970)
b5147
CANN: Add support for async operator submission (#12864) Submit operators using asynchronous threads to improve performance. Use the environment variable GGML_CANN_ASYNC_MODE to control whether asynchronous submission is enabled. It is disabled by default. Testing shows a 10%–20% performance improvement in scenarios with small parameter sizes, especially in quantized models.
b5141
metal : add FA-vec kernels for head size 96 (#12952) ggml-ci
b5121
server : add VSCode's Github Copilot Chat support (#12896) * server : add VSCode's Github Copilot Chat support * cont : update handler name
b5117
sycl: Support sycl_ext_oneapi_limited_graph (#12873) The current usage of the SYCL-Graph extension checks for the `sycl_ext_oneapi_graph` device aspect. However, it is also possible to support `sycl_ext_oneapi_limied_graph` devices that don't support update