Improve the heuristic logic for fp8 weight padding #279

charlifu · 2024-11-14T16:18:51Z

This PR improves the condition of padding weight to only when the size of last dimension % 512 = 0.

shajrawi · 2024-11-14T16:34:41Z

Merging ASAP as we are starting a nightly docker build early in the day

divakar-amd · 2024-11-19T23:32:22Z

@charlifu I believe we would also want to pad here (for tuning):

vllm/benchmarks/kernels/benchmark_moe.py

Lines 60 to 63 in 62334b5

    
           w1 = torch.randn(num_experts, 
        
                            shard_intermediate_size, 
        
                            hidden_size, 
        
                            dtype=init_dtype)

charlifu added 2 commits November 14, 2024 16:06

add heuristic logic for weight padding

a7e9918

lint

7e8afeb

charlifu requested review from gshtras and shajrawi November 14, 2024 16:19

gshtras approved these changes Nov 14, 2024

View reviewed changes

shajrawi merged commit 5362727 into develop Nov 14, 2024
9 of 10 checks passed

shajrawi deleted the charlifu/padding_improve branch November 14, 2024 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the heuristic logic for fp8 weight padding #279

Improve the heuristic logic for fp8 weight padding #279

charlifu commented Nov 14, 2024 •

edited

Loading

shajrawi commented Nov 14, 2024

divakar-amd commented Nov 19, 2024 •

edited

Loading

Improve the heuristic logic for fp8 weight padding #279

Improve the heuristic logic for fp8 weight padding #279

Conversation

charlifu commented Nov 14, 2024 • edited Loading

shajrawi commented Nov 14, 2024

divakar-amd commented Nov 19, 2024 • edited Loading

charlifu commented Nov 14, 2024 •

edited

Loading

divakar-amd commented Nov 19, 2024 •

edited

Loading