Skip to content

Conversation

eldarkurtic
Copy link
Contributor

This PR adds stuff needed to enable per-attention-head attention/KV-cache quantization.

@dsikka
Copy link
Collaborator

dsikka commented Sep 2, 2025

I don’t think we want to add a strategy for this. It should be based on the target list

We are in the process of supporting attention quant
Cc @kylesayrs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants