Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for missing quants in CPY (Metal & CUDA). #11987

Closed
wants to merge 2 commits into from

Conversation

gcp
Copy link

@gcp gcp commented Feb 20, 2025

Fixes #10976.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Feb 20, 2025
@ggerganov
Copy link
Member

Could you make a separate PR just for the Metal changes? Btw, I think the copy kernels could be implemented by reusing the dequantize_qX_X functions, likely with a single template + 4 instantiations. Would result in much smaller code change and allows to generalize in the future to other quantizations.

@gcp
Copy link
Author

gcp commented Feb 21, 2025

Btw, I think the copy kernels could be implemented by reusing the dequantize_qX_X functions, likely with a single template + 4 instantiations. Would result in much smaller code change and allows to generalize in the future to other quantizations.

Reusing the dequantize_qX_Y functions works, but doing it with templates is a bit tricky because dequantize_q8_0 swizzles its results differently than all the others (boo!). It would've been nice if this had just been a ggml_get_to_fp32_cuda call but that doesn't deal with the permutations the CPY code is expected to handle 😢

@gcp gcp closed this Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc. bug: Unsupported op "CPY" / Segmentation fault on Metal
2 participants