Skip to content

Conversation

yangjianfengo1
Copy link
Contributor

@yangjianfengo1 yangjianfengo1 commented Sep 25, 2025

描述:
本 PR 为 w4afp8的激活支持动态per token量化,权重支持per group量化,对于token=256,m=1792, k=8192的moe w4afp8 gemm,激活shape为[256, 8192],权重shape为[1792,8192] (方便描述起见省略了专家数)

  • 之前激活的量化方式静态per tensor,激活scale的shape为[1],权重量化方式为per channel,即scale的shape为[1792],
  • 现在激活的量化方式动态per token,激活scale的shape为[256],权重可以在channel维度上支持per group,group的大小必须是128的倍数,即scale的shape为[1792, 8192 / 128=64]

性能变化:
image

@yangjianfengo1 yangjianfengo1 changed the title w4afp8 支持per group 【New Feature】W4afp8 supports per group quantization Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant