Releases · zhouwg/ggml-hexagon

21 Apr 03:16

2016f07

b5162 Latest

Latest

convert : experimental support for `--mmproj` flag (#13023)

* convert : experimental support for `--mmproj` flag

* fix bad ctrl+f replace

* fix style

* split into subclasses TextModel and VisionModel

* rename Mode --> ModelBase

* small fix

* correct CLIP_VISION arch name (because existing GGUF already use it)

* Apply suggestions from code review

Co-authored-by: compilade <git@compilade.net>

* fix Mistral3Model

* fix typo

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>

Assets 26

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-04-21T03:16:32Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-04-21T03:16:41Z
llama-b5162-bin-macos-arm64.zip

24.2 MB 2025-04-21T03:16:56Z
llama-b5162-bin-macos-x64.zip

25.8 MB 2025-04-21T03:16:58Z
llama-b5162-bin-ubuntu-arm64.zip

25.9 MB 2025-04-21T03:16:59Z
llama-b5162-bin-ubuntu-vulkan-x64.zip

34.7 MB 2025-04-21T03:17:00Z
llama-b5162-bin-ubuntu-x64.zip

27.4 MB 2025-04-21T03:17:02Z
llama-b5162-bin-win-avx-x64.zip

19.7 MB 2025-04-21T03:17:04Z
llama-b5162-bin-win-avx2-x64.zip

19.7 MB 2025-04-21T03:17:05Z
llama-b5162-bin-win-avx512-x64.zip

19.7 MB 2025-04-21T03:17:06Z
Source code (zip)

2025-04-20T21:29:36Z
Source code (tar.gz)

2025-04-20T21:29:36Z

19 Apr 23:02

github-actions

b5158

0013715

b5158

Disable CI cross-compile builds (#13022)

Assets 26

19 Apr 01:15

github-actions

b5155

6408210

b5155

main : Fix Ctrl+D/newline handling (#12951)

This restores the behavior from #491. This does not affect Ctrl+D's ability to
terminate --multiline-input lines (#1040).

This also actually implements #587: "If the user wants the text to end in a
newline, this should be accomplished by explicitly adding a newline by using
\ followed by return, then returning control by pressing return again."

Fixes #12949

Assets 26

18 Apr 16:24

github-actions

b5152

8d66005

b5152

SYCL: Refactor and enable FP16 in binary broadcast OPs (#12975)

* SYCL: refactor move to a separate file

* Fix binbcast

* Remove duplicates

* fix include formatting

* fix typo

Assets 26

18 Apr 03:35

github-actions

b5149

2f74c35

b5149

graph : make FA compatible with MLA + add initial Metal kernels (#12953)

* graph : make mla compatible with FA

* metal : add exp FA kernels for DeepSeek models

ggml-ci

* llama : minor naming updates

ggml-ci

* ggml : disable FA for DS head sizes

* tests : add FA tests for MLA shapes

ggml-ci

Assets 26

17 Apr 14:14

github-actions

b5148

207c22e

b5148

ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (#12970)

Assets 26

17 Apr 13:27

github-actions

b5147

7a395f6

b5147

CANN: Add support for async operator submission (#12864)

Submit operators using asynchronous threads to improve performance.

Use the environment variable GGML_CANN_ASYNC_MODE to control whether
asynchronous submission is enabled. It is disabled by default.

Testing shows a 10%–20% performance improvement in scenarios with
small parameter sizes, especially in quantized models.

Assets 26

15 Apr 15:54

github-actions

b5141

f8f820c

b5141

metal : add FA-vec kernels for head size 96 (#12952)

ggml-ci

Assets 26

12 Apr 02:24

github-actions

b5121

c94085d

b5121

server : add VSCode's Github Copilot Chat support (#12896)

* server : add VSCode's Github Copilot Chat support

* cont : update handler name

Assets 26

11 Apr 15:27

github-actions

b5117

578754b

b5117

sycl: Support sycl_ext_oneapi_limited_graph (#12873)

The current usage of the SYCL-Graph extension checks for
the `sycl_ext_oneapi_graph` device aspect. However, it is also
possible to support `sycl_ext_oneapi_limied_graph` devices that
don't support update

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: zhouwg/ggml-hexagon

b5162

b5158

b5155

b5152

b5149

b5148

b5147

b5141

b5121

b5117