Update FP8 kernel configuration for 4xGPU support on AMD #3850

Eliovp · 2025-02-25T11:48:52Z

Motivation

This PR aims to restore and improve FP8 quantization kernel functionality (specific to the Deepseek R1 model) on systems using 4 GPUs, especially on AMD GPUs. The changes ensure that the proper configuration files are selected based on the number of available GPUs, and that the default settings for the matrix multiplication function are adjusted to better support MFMA instructions on lower GPU counts.

Modifications

Added GPU Count Helper: Introduced a new helper function get_num_gpus() to dynamically determine the number of GPUs using torch.cuda.device_count().
Updated Config File Selection: Modified get_w8a8_block_fp8_configs to adjust the JSON configuration file naming logic when using HIP and fewer than 8 GPUs.
Adjusted Default FP8 MatMul Configuration: Revised w8a8_block_fp8_matmul to provide an alternative configuration for AMD GPUs:
- For systems with 4 or fewer GPUs, the block sizes and stage count are reduced to ensure compatibility with MFMA instructions.
- For other cases, the original configuration remains unchanged.
Enhanced Compatibility: These updates make it possible to inference the Deepseek R1 model on systems with 4 GPUs without compromising performance on other setups.

Example

HSA_NO_SCRATCH_RECLAIM=1 HIP_VISIBLE_DEVICES=4,5,6,7 python3 -m sglang.bench_offline_throughput --model-path deepseek-ai/DeepSeek-R1 --tp 4 --num-prompts 10 --trust-remote-code

====== Offline Throughput Benchmark Result =======
Backend:                                 engine
Successful requests:                     10
Benchmark duration (s):                  24.74
Total input tokens:                      1972
Total generated tokens:                  2784
Request throughput (req/s):              0.40
Input token throughput (tok/s):          79.71
Output token throughput (tok/s):         112.54
Total token throughput (tok/s):          192.25
==================================================

Update FP8 kernel configuration for 4xGPU support on AMD

96d71e3

Eliovp requested review from merrymercy, Ying1123, zhyncs and ispobock as code owners February 25, 2025 11:48

Better results on 4GPUs

543db75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update FP8 kernel configuration for 4xGPU support on AMD #3850

Update FP8 kernel configuration for 4xGPU support on AMD #3850

Eliovp commented Feb 25, 2025 •

edited

Loading

Update FP8 kernel configuration for 4xGPU support on AMD #3850

Are you sure you want to change the base?

Update FP8 kernel configuration for 4xGPU support on AMD #3850

Conversation

Eliovp commented Feb 25, 2025 • edited Loading

Motivation

Modifications

Example

Eliovp commented Feb 25, 2025 •

edited

Loading