add support for per-head attention quantization #442

eldarkurtic · 2025-09-02T09:09:19Z

This PR adds stuff needed to enable per-attention-head attention/KV-cache quantization.

dsikka · 2025-09-02T11:04:31Z

I don’t think we want to add a strategy for this. It should be based on the target list

We are in the process of supporting attention quant
Cc @kylesayrs

add support for per-head attention quantization

a1cd6e9

eldarkurtic mentioned this pull request Sep 2, 2025

add support for per-head attention quantization vllm-project/llm-compressor#1791

Open

Provide feedback