Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[block_wise w8a8]Add block_wise w8w8 tunning scripts #3873

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

coolhok
Copy link
Contributor

@coolhok coolhok commented Feb 26, 2025

Motivation

add block-wise int8 w8a8 quantization auto tuning script.

Modifications

  1. modify tuning_fused_moe_triton.py adapter block-wise int8 w8a8 moe.
  2. add tuning_block_wise_fp8.py for tuning block-wise int8 w8a8 gemm.

run block-wise int8 w8a8 gemm tuning

python tuning_block_wise_w8a8.py -tp 8

other

When running on multiple GPUs, I think we need to split weight_shapes, so each GPU can tune all batch sizes and save complete config.
If splitting batch sizes, each runs the same shape with different batch sizes, but save_config does not support incremental saving.

Checklist

@coolhok coolhok changed the title Worship int8 w8w8 [block_wise w8a8]Add block_wise w8w8 tunning scripts Feb 26, 2025
@xihuai18
Copy link

Can you upload the tuned config?

@lambert0312
Copy link

lambert0312 commented Feb 26, 2025

Can you upload the tuned config?

I can upload a800 tuned config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants