[Bugfix] Fix Per-Token Dynamic Activation Quantization #393

max410011 · 2025-07-14T17:56:04Z

Summary

This PR fixes the activation quantization issue described in Issue #394, where the input scale shape was incorrect when using the Dynamic TOKEN strategy.

Fix

Corrected the reduction dimensions to ensure only the hidden dimension is reduced.
This ensures the input scale shape is (batch_size, seq_len, 1) instead of (1, seq_len, hidden_dim).

brian-dellabetta · 2025-07-15T21:53:34Z

Hi @max410011 , appreciate the thorough detail in the issue! I tried your PR, and both original main and your branch seem to work, the resultant models can be loaded up and run in vllm, which surprises me. This is some old code, and per-token/per-channel always slips me up. I will ask around to see if your reasoning in the issue description is correct.

brian-dellabetta

i validated that this gives the shape described in #394 , and after internal conversations this is correct. This is only an issue when running outside of vllm

kylesayrs

Could you add a test to demonstrate and verify that these changes are correct? Awesome catch and resolution, thanks!

src/compressed_tensors/quantization/utils/helpers.py

dsikka · 2025-10-02T20:33:25Z

@kylesayrs

src/compressed_tensors/quantization/utils/helpers.py

Fix per-token dynamic quant

b8c5a91

max410011 mentioned this pull request Jul 14, 2025

Unexpected Input Scale Shape for Dynamic Per-Token Activation Quantization #394

Open

brian-dellabetta approved these changes Jul 22, 2025

View reviewed changes

brian-dellabetta requested review from markurtz, kylesayrs, dsikka and shanjiaz July 22, 2025 21:15

kylesayrs reviewed Jul 23, 2025

View reviewed changes

src/compressed_tensors/quantization/utils/helpers.py Show resolved Hide resolved

brian-dellabetta mentioned this pull request Jul 31, 2025

[Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params #407

Merged

dsikka requested a review from kylesayrs October 2, 2025 20:32

kylesayrs approved these changes Oct 3, 2025

View reviewed changes

src/compressed_tensors/quantization/utils/helpers.py Show resolved Hide resolved

shanjiaz approved these changes Oct 3, 2025

View reviewed changes

kylesayrs enabled auto-merge (squash) October 3, 2025 19:57

kylesayrs merged commit 2dd1b62 into neuralmagic:main Oct 3, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix Per-Token Dynamic Activation Quantization #393

[Bugfix] Fix Per-Token Dynamic Activation Quantization #393

max410011 commented Jul 14, 2025 •

edited

Loading

Uh oh!

brian-dellabetta commented Jul 15, 2025 •

edited

Loading

Uh oh!

brian-dellabetta left a comment

Uh oh!

kylesayrs left a comment

Uh oh!

Uh oh!

dsikka commented Oct 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Bugfix] Fix Per-Token Dynamic Activation Quantization #393

[Bugfix] Fix Per-Token Dynamic Activation Quantization #393

Conversation

max410011 commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Uh oh!

brian-dellabetta commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dsikka commented Oct 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

max410011 commented Jul 14, 2025 •

edited

Loading

brian-dellabetta commented Jul 15, 2025 •

edited

Loading