[Multi-Modifier] Scoped apply quantization config #432

brian-dellabetta · 2025-08-21T18:59:03Z

In order to support multi-modifier recipes (e.g. AWQ+W4A16 on self_attn layers and FP8_DYNAMIC on mlp layers), quantization config and status must be applied only to the modules scoped to the modifier, not all at once. This updates apply_quantization_config so that quantization_config and quantization_status are applied just to the target modules, not changed globally across all modules.

In order for proper target prioritization, apply_quantization_status is performed regardless of what the current status is for the model. Without these changes, test_target_prioritization will fail.

Other small changes:

Added a test_multi_apply_quantization_config to make sure the application of multiple quantization configs in series works correctly -- shapes are correct and unused parameters are correctly removed.
Drop override_quantization_status in favor of more general patch_attr.
Removed infer_quantization_status which is no longer meaningful at the model level. It is also no longer needed because module's current status isn't checked.
Added ALL_QPARAM_NAMES constant so that parameters related to quantization can be cleared from modules during init
Removed all references to "quant_method": "sparseml" in favor of "compressed-tensors"
Dropped usage of compress_quantized_weights and apply_quantization_status. We can remove compress_quantized_weights and references to it in examples/notebooks in a follow-up PR
Also updated tests to get rid of warnings:

tests/test_compressors/quantized_compressors/test_fp8_quant.py::test_quant_format[channel-None-sc2-zp2]
  /home/runner/work/compressed-tensors/compressed-tensors/tests/test_compressors/quantized_compressors/test_fp8_quant.py:78: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
    "dummy.weight_scale": torch.tensor(sc, dtype=torch.float32),

Merge in conjunction with

[Multi-modifier] Support scoped application of quantization config/status vllm-project/llm-compressor#1772

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

kylesayrs · 2025-08-25T18:44:19Z

FYI #428. Also touches some apply logic and adds more scheme merging

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

kylesayrs

LGTM!

src/compressed_tensors/quantization/lifecycle/compressed.py

src/compressed_tensors/quantization/lifecycle/apply.py

src/compressed_tensors/quantization/lifecycle/initialize.py

rahul-tuli

Good job! LGTM! 🚀

examples/quantize_and_pack_int4.ipynb

src/compressed_tensors/quantization/lifecycle/initialize.py

src/compressed_tensors/quantization/lifecycle/compressed.py

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

src/compressed_tensors/quantization/quant_names.py

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

…atus (#1772) SUMMARY: Prerequisites: * neuralmagic/compressed-tensors#432 This allows for multi-modifier support by scoping the application of quantization config/status to only the modules in the model that match the given targets/ignore configuration, rather than all modules. Initialization of observers is moved to on_start (instead of on_initialize) to match their removal on_end (and not on_finalize). This prevents collision during the multi-modifier lifecycle - [x] Update AWQ - [x] Update QuantizationModifier - [x] Update QuantizationMixin - [x] Update GPTQ - [x] No other quantization modifiers exist TEST PLAN: - Tests were added to neuralmagic/compressed-tensors#432 to confirm correct application of multiple modifiers. - Added an example in this PR to show how AWQ and GPTQ can be applied heterogeneously to a model, along with a small README. Logs show alternating AWQ and GPTQ messages for `"sequential"`, and correct behavior for `"independent"` pipelines. [Model checkpoint](https://huggingface.co/nm-testing/Meta-Llama-3-8B-Instruct-selfattn-w8a8-mlp-w4a16-sequential/tree/main) for the sequential pipeline shows correct application of W8A8 to self_attn layers and W4A16 to mlp layers. config.json and safetensors weights all look as expected --------- Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta changed the title ~~[Mulit-Modifier] Scoped apply quantization config~~ [Multi-Modifier] Scoped apply quantization config Aug 21, 2025

brian-dellabetta force-pushed the bdellabe/scoped-quant-status branch 2 times, most recently from 03fb664 to 550c0ad Compare August 21, 2025 19:18

brian-dellabetta added 5 commits August 21, 2025 19:38

squashed/rebased

df873fb

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

cleanup

2a58648

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

remove TODO

7cdd1cd

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

more clenaup

78d274d

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

cleanup

606f177

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta force-pushed the bdellabe/scoped-quant-status branch from f70aedb to 606f177 Compare August 21, 2025 19:38

brian-dellabetta added 2 commits August 21, 2025 19:38

formatting

829b7cb

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

formatting

7f2c5de

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta mentioned this pull request Aug 21, 2025

[Multi-modifier] Support scoped application of quantization config/status vllm-project/llm-compressor#1772

Merged

5 tasks

brian-dellabetta mentioned this pull request Aug 26, 2025

Simplify apply_quantization_config #433

Merged

merge main

712a731

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta force-pushed the bdellabe/scoped-quant-status branch from 24af65a to 8259cbb Compare August 28, 2025 17:00

resolve redundant merge code

b515c1b

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta force-pushed the bdellabe/scoped-quant-status branch from 8259cbb to b515c1b Compare August 28, 2025 17:00

brian-dellabetta added 11 commits August 28, 2025 13:10

style fixes

0e11f93

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

cleanup / test fixes

80844ea

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

Merge branch 'main' into bdellabe/scoped-quant-status

cbac9f2

test fixes

f62b70c

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

formatting/touchups

ac1ce1c

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

stylefix

0da8730

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

stylefixes

5744d73

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

stylefixes

304615e

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

remaining test fixes

76f81b9

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

revert extraneous test change

5bf957f

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

remove test running code

360a9fb

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta requested a review from kylesayrs September 9, 2025 21:45

brian-dellabetta mentioned this pull request Sep 15, 2025

Always save g_idx when initialized in quantization compressor #467

Merged

brian-dellabetta dismissed kylesayrs’s stale review via 92f8757 September 15, 2025 17:16

multi-apply quantization config test

d2903a1

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta force-pushed the bdellabe/scoped-quant-status branch from 92f8757 to d2903a1 Compare September 15, 2025 17:20

multi-apply test cleanup

5776c86

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

kylesayrs previously approved these changes Sep 15, 2025

View reviewed changes

src/compressed_tensors/quantization/lifecycle/compressed.py Show resolved Hide resolved

rahul-tuli reviewed Sep 16, 2025

View reviewed changes

src/compressed_tensors/quantization/lifecycle/apply.py Show resolved Hide resolved

src/compressed_tensors/quantization/lifecycle/initialize.py Outdated Show resolved Hide resolved

rahul-tuli previously approved these changes Sep 16, 2025

View reviewed changes

brian-dellabetta requested a review from dsikka September 16, 2025 16:06

dsikka requested changes Sep 16, 2025

View reviewed changes

brian-dellabetta added 2 commits September 17, 2025 16:46

Merge branch 'main' into bdellabe/scoped-quant-status

5e5ffb5

ALL_QPARAM_NAMES

98a97e5

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta dismissed stale reviews from rahul-tuli and kylesayrs via 98a97e5 September 17, 2025 20:34

stylefixes

02d5e78

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta commented Sep 17, 2025

View reviewed changes

src/compressed_tensors/quantization/quant_names.py Outdated Show resolved Hide resolved

brian-dellabetta added 2 commits September 17, 2025 22:14

exclude sparsity param names

7d8c5a4

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

QuantizationMetadata class

01af659

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta force-pushed the bdellabe/scoped-quant-status branch from f7239b1 to 01af659 Compare September 18, 2025 16:48

stylefix

3fdd125

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta requested review from kylesayrs, dsikka and rahul-tuli September 18, 2025 17:15

kylesayrs previously approved these changes Sep 18, 2025

View reviewed changes

llm-compressor test fix

b789adf

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta dismissed kylesayrs’s stale review via b789adf September 18, 2025 18:32

dsikka approved these changes Sep 18, 2025

View reviewed changes

dsikka merged commit dfd069b into main Sep 18, 2025
2 checks passed

dsikka deleted the bdellabe/scoped-quant-status branch September 18, 2025 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Multi-Modifier] Scoped apply quantization config #432

[Multi-Modifier] Scoped apply quantization config #432

Uh oh!

brian-dellabetta commented Aug 21, 2025 •

edited

Loading

Uh oh!

kylesayrs commented Aug 25, 2025 •

edited

Loading

Uh oh!

kylesayrs left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rahul-tuli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Multi-Modifier] Scoped apply quantization config #432

[Multi-Modifier] Scoped apply quantization config #432

Uh oh!

Conversation

brian-dellabetta commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kylesayrs commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rahul-tuli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta commented Aug 21, 2025 •

edited

Loading

kylesayrs commented Aug 25, 2025 •

edited

Loading