ggml quantization in thunder v0 #754

t-vi · 2024-07-11T05:14:45Z

Notebook demoing a ggml transform (without specialized kernels yet, but e.g. reducing memory consumption).

The top part is just the litgpt generation demo litgpt itself.

Note that I needed to add an eval hack for "meta" device in check same tensors in order to be able to trace through the model with meta weights.

review-notebook-app · 2024-07-11T05:14:50Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

lantiga · 2024-07-19T12:08:53Z

@t-vi tests pass now, should we? : )

ggml quantization in thunder v0

6d6ad92

t-vi requested review from mruberry and lantiga as code owners July 11, 2024 05:14

t-vi added 6 commits July 16, 2024 07:03

Merge branch 'main' into tom/ggmlquant

2fab277

Merge branch 'main' into tom/ggmlquant

34b1c0c

ad hoc benchmarking

8f831c3

Merge remote-tracking branch 'origin/main' into tom/ggmlquant

bd975af

wip

0032aa4

Merge branch 'main' into tom/ggmlquant

7263c0a

t-vi mentioned this pull request Jul 17, 2024

remove OverridenKVCache and fix some peculiar cases of prims.copy_ + NVFuser #788

Merged

lantiga and others added 2 commits July 19, 2024 13:34

Merge branch 'main' into tom/ggmlquant

322a08f

Fix typo

5275cbf

lantiga approved these changes Jul 19, 2024

View reviewed changes

t-vi merged commit b1cf1bf into main Jul 19, 2024
36 checks passed

t-vi deleted the tom/ggmlquant branch July 19, 2024 13:06

Provide feedback