-
Notifications
You must be signed in to change notification settings - Fork 45
Pull requests: vllm-project/vllm-gaudi
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix deepseek FP8 weight creation due to upstream vllm change
#281
opened Sep 26, 2025 by
skavulya
Loading…
v0.10.2 [SW-236002] Enable group indexing for compressed w4a16 format (#243) - cherry-pick
#263
opened Sep 25, 2025 by
skavulya
Loading…
Add restriction of usage VLLM_DECODE_BLOCK_BUCKET_MAX>max_blocks
#245
opened Sep 24, 2025 by
iboiko-habana
Loading…
Enable H2d(runtime scale patching) for Torch compile by default
#235
opened Sep 23, 2025 by
jczaja
Loading…
[0.10.2] Fix calculating used blocks in unified attention
#234
opened Sep 23, 2025 by
madamczyk-intel
Loading…
Add tests for custom operator implementation correctness
#222
opened Sep 22, 2025 by
Kacper-Pietkun
•
Draft
Port: Adding prompt context flags for linear warmup
#218
opened Sep 22, 2025 by
iboiko-habana
Loading…
Use type strings to be compatible with python 3.10
#214
opened Sep 22, 2025 by
madamczyk-intel
Loading…
[SW-240630] Qwen3-30B-MoE: Flatten post-attn seqs and restore model o…
#212
opened Sep 20, 2025 by
attafosu
Loading…
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.