Skip to content

Releases: zhouwg/ggml-hexagon

b5162

21 Apr 03:16
2016f07
Compare
Choose a tag to compare
convert : experimental support for `--mmproj` flag (#13023)

* convert : experimental support for `--mmproj` flag

* fix bad ctrl+f replace

* fix style

* split into subclasses TextModel and VisionModel

* rename Mode --> ModelBase

* small fix

* correct CLIP_VISION arch name (because existing GGUF already use it)

* Apply suggestions from code review

Co-authored-by: compilade <git@compilade.net>

* fix Mistral3Model

* fix typo

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>

b5158

19 Apr 23:02
0013715
Compare
Choose a tag to compare
Disable CI cross-compile builds (#13022)

b5155

19 Apr 01:15
6408210
Compare
Choose a tag to compare
main : Fix Ctrl+D/newline handling (#12951)

This restores the behavior from #491. This does not affect Ctrl+D's ability to
terminate --multiline-input lines (#1040).

This also actually implements #587: "If the user wants the text to end in a
newline, this should be accomplished by explicitly adding a newline by using
\ followed by return, then returning control by pressing return again."

Fixes #12949

b5152

18 Apr 16:24
8d66005
Compare
Choose a tag to compare
SYCL: Refactor and enable FP16 in binary broadcast OPs (#12975)

* SYCL: refactor move to a separate file

* Fix binbcast

* Remove duplicates

* fix include formatting

* fix typo

b5149

18 Apr 03:35
2f74c35
Compare
Choose a tag to compare
graph : make FA compatible with MLA + add initial Metal kernels (#12953)

* graph : make mla compatible with FA

* metal : add exp FA kernels for DeepSeek models

ggml-ci

* llama : minor naming updates

ggml-ci

* ggml : disable FA for DS head sizes

* tests : add FA tests for MLA shapes

ggml-ci

b5148

17 Apr 14:14
207c22e
Compare
Choose a tag to compare
ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (#12970)

b5147

17 Apr 13:27
7a395f6
Compare
Choose a tag to compare
CANN: Add support for async operator submission (#12864)

Submit operators using asynchronous threads to improve performance.

Use the environment variable GGML_CANN_ASYNC_MODE to control whether
asynchronous submission is enabled. It is disabled by default.

Testing shows a 10%–20% performance improvement in scenarios with
small parameter sizes, especially in quantized models.

b5141

15 Apr 15:54
f8f820c
Compare
Choose a tag to compare
metal : add FA-vec kernels for head size 96 (#12952)

ggml-ci

b5121

12 Apr 02:24
c94085d
Compare
Choose a tag to compare
server : add VSCode's Github Copilot Chat support (#12896)

* server : add VSCode's Github Copilot Chat support

* cont : update handler name

b5117

11 Apr 15:27
578754b
Compare
Choose a tag to compare
sycl: Support sycl_ext_oneapi_limited_graph (#12873)

The current usage of the SYCL-Graph extension checks for
the `sycl_ext_oneapi_graph` device aspect. However, it is also
possible to support `sycl_ext_oneapi_limied_graph` devices that
don't support update