-
Notifications
You must be signed in to change notification settings - Fork 71
Pull requests: vllm-project/vllm-gaudi
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix transformers version mismatch causing editable install failure
#618
opened Nov 21, 2025 by
Saiteja-Garlapati
Loading…
Updates to validated models list
documentation
Improvements or additions to documentation
skip-gaudi-tests
#614
opened Nov 21, 2025 by
PatrykWo
Loading…
Allow building vllm-plugin docker for ubuntu with upstream torch
#613
opened Nov 21, 2025 by
mmuszynskihabana
Loading…
Spec decode: support of more than one num speculative tokens
#609
opened Nov 21, 2025 by
jerrychenhf
Loading…
Add the missing step to the Quick Start guide
documentation
Improvements or additions to documentation
skip-gaudi-tests
#599
opened Nov 20, 2025 by
mhelf-intel
Loading…
Cherry-pick release docker cmdline fixes, WA and long context support
#576
opened Nov 17, 2025 by
nngokhale
Loading…
Implementing softmax_fa2 in partial_attn shared and causal
#566
opened Nov 13, 2025 by
ksmusz
Loading…
Docs: Missing content from Habana docs
documentation
Improvements or additions to documentation
skip-gaudi-tests
#562
opened Nov 13, 2025 by
mhelf-intel
Loading…
Add a plugin for variable support in Markdown
documentation
Improvements or additions to documentation
skip-gaudi-tests
#554
opened Nov 12, 2025 by
mhelf-intel
Loading…
fix loading fp8 static quantized model for compressored_tensors format.
#552
opened Nov 11, 2025 by
lkk12014402
Loading…
Prepare Unified Attention biases on HPU + add NumPy memory pooling
#550
opened Nov 7, 2025 by
kzawora-intel
Loading…
[SW-228042] Add support for dynamic vLLM kv-cache quantization
#538
opened Nov 6, 2025 by
dudilester
Loading…
[Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter, prepare biases on CPU
#530
opened Nov 5, 2025 by
kzawora-intel
Loading…
[Attention Metadata Overhaul 1/N] Extract metadata update to HPUAttentionMetadataProcessor
#526
opened Nov 5, 2025 by
kzawora-intel
Loading…
reduce graph recompilations in input embeddings for Gemma3
#519
opened Nov 4, 2025 by
skaulintel
•
Draft
Previous Next
ProTip!
Adding no:label will show everything without a label.