[block_wise w8a8]Add block_wise w8w8 tunning scripts #3873

coolhok · 2025-02-26T05:42:05Z

Motivation

add block-wise int8 w8a8 quantization auto tuning script.

Modifications

modify tuning_fused_moe_triton.py adapter block-wise int8 w8a8 moe.
add tuning_block_wise_fp8.py for tuning block-wise int8 w8a8 gemm.

run block-wise int8 w8a8 gemm tuning

python tuning_block_wise_w8a8.py -tp 8

other

When running on multiple GPUs, I think we need to split weight_shapes, so each GPU can tune all batch sizes and save complete config.
If splitting batch sizes, each runs the same shape with different batch sizes, but save_config does not support incremental saving.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

xihuai18 · 2025-02-26T09:20:22Z

Can you upload the tuned config?

lambert0312 · 2025-02-26T09:39:53Z

Can you upload the tuned config?

I can upload a800 tuned config

coolhok added 3 commits February 25, 2025 20:15

[daily]fused moe trition support int8 w8a8

ceab14e

[fix] name config err

bf22891

[tune]add block wise w8a8 script

adcff7b

coolhok changed the title ~~Worship int8 w8w8~~ [block_wise w8a8]Add block_wise w8w8 tunning scripts Feb 26, 2025

coolhok and others added 2 commits February 26, 2025 13:48

[daily]lint failed

8def65b

Merge branch 'main' into worship_int8_w8w8

fa0f2fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[block_wise w8a8]Add block_wise w8w8 tunning scripts #3873

[block_wise w8a8]Add block_wise w8w8 tunning scripts #3873

coolhok commented Feb 26, 2025 •

edited

Loading

xihuai18 commented Feb 26, 2025

lambert0312 commented Feb 26, 2025 •

edited

Loading

[block_wise w8a8]Add block_wise w8w8 tunning scripts #3873

Are you sure you want to change the base?

[block_wise w8a8]Add block_wise w8w8 tunning scripts #3873

Conversation

coolhok commented Feb 26, 2025 • edited Loading

Motivation

Modifications

run block-wise int8 w8a8 gemm tuning

other

Checklist

xihuai18 commented Feb 26, 2025

lambert0312 commented Feb 26, 2025 • edited Loading

coolhok commented Feb 26, 2025 •

edited

Loading

lambert0312 commented Feb 26, 2025 •

edited

Loading