fix(deps): update dependency vllm to ^0.11.0 #257

dreadnode-renovate-bot · 2025-08-21T20:04:22Z

This PR contains the following updates:

| Package | Change | Age | Confidence |
|

Generated Summary

Updated the vllm dependency version from ^0.5.0 to ^0.11.0.
This change affects the optional dependency configuration in pyproject.toml.
Aimed at aligning with newer vllm features and improvements.

This summary was generated with ❤️ by rigging

| vllm | ^0.5.0 -> ^0.11.0 | | |

Release Notes

vllm-project/vllm (vllm)

`v0.11.0`

Compare Source

Highlights

This release features 538 commits, 207 contributors (65 new contributors)!

This release completes the removal of V0 engine. V0 engine code including AsyncLLMEngine, LLMEngine, MQLLMEngine, all attention backends, and related components have been removed. V1 is the only engine in the codebase now.
This releases turns on FULL_AND_PIECEWISE as the CUDA graph mode default. This should provide better out of the box performance for most models, particularly fine-grained MoEs, while preserving compatibility with existing models supporting only PIECEWISE mode.

Model Support

New architectures: DeepSeek-V3.2-Exp (#25896), Qwen3-VL series (#24727), Qwen3-Next (#24526), OLMo3 (#24534), LongCat-Flash (#23991), Dots OCR (#24645), Ling2.0 (#24627), CWM (#25611).
Encoders: RADIO encoder support (#24595), Transformers backend support for encoder-only models (#25174).
Task expansion: BERT token classification/NER (#24872), multimodal models for pooling tasks (#24451).
Data parallel for vision encoders: InternVL (#23909), Qwen2-VL (#25445), Qwen3-VL (#24955).
Speculative decoding: EAGLE3 for MiniCPM3 (#24243) and GPT-OSS (#25246).
Features: Qwen3-VL text-only mode (#26000), EVS video token pruning (#22980), Mamba2 TP+quantization (#24593), MRoPE + YaRN (#25384), Whisper on XPU (#25123), LongCat-Flash-Chat tool calling (#24083).
Performance: GLM-4.1V 916ms TTFT reduction via fused RMSNorm (#24733), GLM-4 MoE SharedFusedMoE optimization (#24849), Qwen2.5-VL CUDA sync removal (#24741), Qwen3-VL Triton MRoPE kernel (#25055), FP8 checkpoints for Qwen3-Next (#25079).
Reasoning: SeedOSS reason parser (#24263).

Engine Core

KV cache offloading: CPU offloading with LRU management (#19848, #20075, #21448, #22595, #24251).
V1 features: Prompt embeddings (#24278), sharded state loading (#25308), FlexAttention sliding window (#24089), LLM.apply_model (#18465).
Hybrid allocator: Pipeline parallel (#23974), varying hidden sizes (#25101).
Async scheduling: Uniprocessor executor support (#24219).
Architecture: Tokenizer group removal (#24078), shared memory multimodal caching (#20452).
Attention: Hybrid SSM/Attention in Triton (#21197), FlashAttention 3 for ViT (#24347).
Performance: FlashInfer RoPE 2x speedup (#21126), fused Q/K RoPE 11% improvement (#24511, #25005), 8x spec decode overhead reduction (#24986), FlashInfer spec decode with 1.14x speedup (#25196), model info caching (#23558), inputs_embeds copy avoidance (#25739).
LoRA: Optimized weight loading (#25403).
Defaults: CUDA graph mode FULL_AND_PIECEWISE (#25444), Inductor standalone compile disabled (#25391).
torch.compile: CUDA graph Inductor partition integration (#24281).

Hardware & Performance

NVIDIA: FP8 FlashInfer MLA decode (#24705), BF16 fused MoE for Hopper/Blackwell expert parallel (#25503).
DeepGEMM: Enabled by default (#24462), 5.5% throughput improvement (#24783).
New architectures: RISC-V 64-bit (#22112), ARM non-x86 CPU (#25166), ARM 4-bit fused MoE (#23809).
AMD: ROCm 7.0 (#25178), GLM-4.5 MI300X tuning (#25703).
Intel XPU: MoE DP accuracy fix (#25465).

Large Scale Serving & Performance

Dual-Batch Overlap (DBO): Overlapping computation mechanism (#23693), DeepEP high throughput + prefill (#24845).
Data Parallelism: torchrun launcher (#24899), Ray placement groups (#25026), Triton DP/EP kernels (#24588).
EPLB: Hunyuan V1 (#23078), Mixtral (#22842), static placement (#23745), reduced overhead (#24573).
Disaggregated serving: KV transfer metrics (#22188), NIXL MLA latent dimension (#25902).
MoE: Shared expert overlap optimization (#24254), SiLU kernel for DeepSeek-R1 (#24054), Enable Allgather/ReduceScatter backend for NaiveAllToAll (#23964).
Distributed: NCCL symmetric memory with 3-4% throughput improvement (#24532), enabled by default for TP (#25070).

Quantization

FP8: Per-token-group quantization (#24342), hardware-accelerated instructions (#24757), torch.compile KV cache (#22758), paged attention update (#22222).
FP4: NVFP4 for dense models (#25609), Gemma3 (#22771), Llama 3.1 405B (#25135).
W4A8: Faster preprocessing (#23972).
Compressed tensors: Blocked FP8 for MoE (#25219).

API & Frontend

OpenAI: Prompt logprobs for all tokens (#24956), logprobs=-1 for full vocab (#25031), reasoning streaming events (#24938), Responses API MCP tools (#24628, #24985), health 503 on dead engine (#24897).
Multimodal: Media UUID caching (#23950), image path format (#25081).
Tool calling: XML parser for Qwen3-Coder (#25028), Hermes-style tokens (#25281).
CLI: --enable-logging (#25610), improved --help (#24903).
Config: Speculative model engine args (#25250), env validation (#24761), NVTX profiling (#25501), guided decoding backward compatibility (#25615, #25422).
Metrics: V1 TPOT histogram (#24015), hidden deprecated gpu_ metrics (#24245), KV cache GiB units (#25204, #25479).
UX: Removed misleading quantization warning (#25012).

Security

https://github.com/vllm-project/vllm/security/advisories/GHSA-wr9h-g72x-mwhm

Dependencies

PyTorch 2.8 for CPU (#25652), FlashInfer 0.3.1 (#24470), CUDA 13 (#24599), ROCm 7.0 (#25178).
Build requirements: C++17 now enforced globally (#24823).
TPU: Deprecated xm.mark_step in favor of torch_xla.sync (#25254).

V0 Deprecation

Engines: AsyncLLMEngine (#25025), LLMEngine (#25033), MQLLMEngine (#25019), core (#25321), model runner (#25328), MP executor (#25329).
Components: Attention backends (#25351), encoder-decoder (#24907), output processor (#25320), sampling metadata (#25345), Sequence/Sampler (#25332).
Interfaces: LoRA (#25686), async output processor (#25334), MultiModalPlaceholderMap (#25366), seq group methods (#25330), placeholder attention (#25510), input embeddings (#25242), multimodal registry (#25362), max_seq_len_to_capture (#25543), attention classes (#25541), hybrid models (#25400), backend suffixes (#25489), compilation fallbacks (#25675), default args (#25409).

What's Changed

[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 by @jeejeelee in #24707
[DOCs] Update ROCm installation docs section by @gshtras in #24691
Enable conversion of multimodal models to pooling tasks by @maxdebayser in #24451
Fix implementation divergence for BLOOM models between vLLM and HuggingFace when using prompt embeds by @qthequartermasterman in #24686
[Bugfix] Fix MRoPE dispatch on CPU by @bigPYJ1151 in #24712
[BugFix] Fix Qwen3-Next PP by @njhill in #24709
[CI] Fix flaky test v1/worker/test_gpu_model_runner.py::test_kv_cache_stride_order by @heheda12345 in #24640
[CI] Add ci_envs for convenient local testing by @noooop in #24630
[CI/Build] Skip prompt embeddings tests on V1-only CPU backend by @bigPYJ1151 in #24721
[Misc][gpt-oss] Add gpt-oss label to PRs that mention harmony or related to builtin tool call by @heheda12345 in #24717
[Bugfix] Fix BNB name match by @jeejeelee in #24735
[Kernel] [CPU] refactor cpu_attn.py:_run_sdpa_forward for better memory access by @ignaciosica in #24701
[sleep mode] save memory for on-the-fly quantization by @youkaichao in #24731
[Multi Modal] Add FA3 in VIT by @wwl2755 in #24347
[Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec by @sfeng33 in #24548
[Doc]: fix typos in various files by @didier-durand in #24726
[Docs] Fix warnings in mkdocs build (continued) by @Zerohertz in #24740
[Bugfix] Fix MRoPE dispatch on XPU by @yma11 in #24724
[Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP by @elvircrn in #24739
[Core] Shared memory based object store for Multimodal data caching and IPC by @dongluw in #20452
[Bugfix][Frontend] Fix --enable-log-outputs does not match the documentation by @kebe7jun in #24626
[Models] Optimise and simplify _validate_and_reshape_mm_tensor by @lgeiger in #24742
[Models] Prevent CUDA sync in Qwen2.5-VL by @lgeiger in #24741
[Model] Switch to Fused RMSNorm in GLM-4.1V model by @SamitHuang in #24733
[UX] Remove AsyncLLM torch profiler disabled log by @mgoin in #24609
[CI] Speed up model unit tests in CI by @afeldman-nm in #24253
[Bugfix] Fix incompatibility between #20452 and #24548 by @DarkLight1337 in #24754
[CI] Trigger BC Linter when labels are added/removed by @zhewenl in #24767
[Benchmark] Allow arbitrary headers to be passed to benchmarked endpoints by @smarterclayton in #23937
[Compilation Bug] Fix Inductor Graph Output with Shape Issue by @yewentao256 in #24772
Invert pattern order to make sure that out_proj layers are identified by @anmarques in #24781
[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode by @MatthewBonanni in #24705
Add FLASHINFER_MLA to backend selector test by @MatthewBonanni in #24753
[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) by @sighingnow in #24667
[Core] Support async scheduling with uniproc executor by @njhill in #24219
[Frontend][Multimodal] Allow skipping media data when UUIDs are provided. by @huachenheli in #23950
[Model] Add Olmo3 model implementation by @2015aroras in #24534
[Bugfix] Fix GPUModelRunner has no attribute lora_manager by @jeejeelee in #24762
[Chore] Remove unused batched RoPE op & kernel by @WoosukKwon in #24789
[Docs] Fix warnings in mkdocs build (continued) by @Zerohertz in #24791
[Docs] Remove Neuron install doc as backend no longer exists by @hmellor in #24396
[Doc]: Remove 404 hyperlinks by @rozeappletree in #24785
[Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization by @elvischenv in #24757
[Kernels][DP/EP] Optimize Silu Kernel for R1 by @elvircrn in #24054
[Core][Multimodal] Cache supports_kw by @lgeiger in #24773
[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe by @mgoin in #24750
[Misc] Correct an outdated comment. by @russellb in #24765
[Doc]: fix typos in various files by @didier-durand in #24798
[CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again by @wwl2755 in #24771
Remove redundant assignment in xfer_buffers, This is a little fix by @ChenTaoyu-SJTU in #24732
[Minor] Simplify duplicative device check for cuda by @ziliangpeng in #24793
[Chore] Minor simplification for non-PP path by @WoosukKwon in #24810
[Multi Modal][Performance] Fused Q,K's apply_rope into one by @wwl2755 in #24511
[Misc] Improve s3_utils type hints with BaseClient by @Zerohertz in #24825
[Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement by @yewentao256 in #24783
fix type of sampling rate for encode_base64 by @co63oc in #24826
[Benchmarks] Throw usage error when using dataset-name random and dataset-path together by @yeqcharlotte in #24819
Force use C++17 globally to avoid compilation error by @chenfengjin in #24823
[Chore] Remove ipex_ops warning by @robertgshaw2-redhat in #24835
[Spec Decoding]Support Spec Decoding Metrics in DP Mode by @wuhang2014 in #24049
[Hybrid Allocator] Support Pipeline Parallel by @heheda12345 in #23974
[Docs] Have a try to improve frameworks/streamlit.md by @windsonsea in #24841
[kv cache] update num_free_blocks in the end by @andyxning in #24228
[Frontend] Skip stop in reasoning content by @gaocegege in #14550
[Bugfix] MiDashengLM model contact error under concurrent testing by @bingchen-mi in #24738
[Doc]: fix typos in various files by @didier-durand in #24821
[Misc] rename interval to max_recent_requests by @andyxning in #24229
[Misc] Own KVConnectors installation by @NickLucche in #24867
[P/D]kv_output_aggregator support heterogeneous by @LCAIZJ in #23917
[UT] enhance free kv cache block queue popleft_n by @andyxning in #24220
[XPU] Set consistent default KV cache layout by @NickLucche in #24745
[Misc] Fix examples openai_pooling_client.py by @noooop in #24853
[Model]: support Ling2.0 by @ant-yy in #24627
[Bugfix] Fix GLM4.1V multimodal processor with compatability for Transformers v4.56 by @Isotr0py in #24822
Fp8 paged attention update by @xiao-llm in #22222
Reinstate existing torch script by @hmellor in #24729
[USAGE] Improve error handling for weight initialization in Unquantized… by @koiker in #20321
Move MultiModalConfig from config/__init__.py to config/multimodal.py by @hmellor in #24659
[Transform] Deterministic Hadacore Transforms by @kylesayrs in #24106
Update num_tokens_across_dp to use nccl instead of gloo by @SageMoore in #24105
Bump Flashinfer to 0.3.1 by @bbartels in #24868
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse by @qandrew in #24561
[gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still by @qandrew in #24759
[Performance] Remove redundant clone() calls in cutlass_mla by @alexm-redhat in #24891
[Bug] Fix Cutlass Scaled MM Compilation Error by @yewentao256 in #24887
[ci] fix wheel names for arm wheels by @simon-mo in #24898
[Tests] fix initialization of kv hash in tests by @mickaelseznec in #24273
[Compile] Fix noop_elimination pass and add tests for noop_elimination by @ZJY0516 in #24880
HuggingFace -> Hugging Face in Integration with Hugging Face docs by @sergiopaniego in #24889
Updated CODEOWNERS for flashinfer, mla, fused_moe by @mgoin in #24906
[Deprecation] Remove DeepGEMM Old Symbol Wrapper by @yewentao256 in #24902
[ROCm][Bugfix] Fix the case where there's bias by @gshtras in #24895
Add pytest-cov and .coveragerc by @rzabarazesh in #24778
[Bug] Fix is_flashmla_supported Check Error by @yewentao256 in #24774
[CI] Small Accuracy Eval Test for Deepseek Model by @yewentao256 in #24259
[Metrics] Hide deprecated metrics with gpu_ prefix by @markmc in #24245
[Docs] Update instructions for how to using existing torch binary by @zou3519 in #24892
Upgrade flashinfer to 0.3.1 by @houseroad in #24470
[XPU] Fix circular import error. by @jikunshang in #24927
Remove V0 Encoder-Decoder Support by @WoosukKwon in #24907
[Bugfix] Fix sequence parallelism bug when enable pipeline parallelism by @cascade812 in #24021
[Bug] [Spec Dec]: Fix kv_cache dtype mismatch for Eagle3 drafter on FP8 target by @vllmellm in #24505
[QWEN NEXT] Fused MoE kernels Optimization configs by @samanamp in #24924
[benchmark] Add triton version in the moe tuned config by @jeejeelee in #24769
[Bugfix] remove duplicate tokens streamed in required tool choice streaming by @Jason-CKY in #23312
[Mamba] Support TP>1 with quantization for mamba2 mixer in case n_groups % tp_size == 0 by @tomeras91 in #24593
[Feat][EPLB] A novel static EPLB placement strategy for MoE models. by @cboss6 in #23745
Move SpeculativeConfig from config/__init__.py to config/speculative.py by @hmellor in #24904
[Docs] move benchmarks README to contributing guides by @yeqcharlotte in #24820
feat: Add Grafana and Perces monitoring dashboards for vLLM by @liangwen12year in #23498
(doc): set cmake c++ compatible standard when building on MacOS CPU. by @teekenl in #23483
[CI] Add Decode Context Parallelism (DCP) test to CI by @minosfuture in #24487
[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 by @cyang49 in #24331
[Core][MultiModalHasher] Don't convert memoryviews to bytes during hashing by @lgeiger in #24925
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM by @SageMoore in #23693
[Bugfix] Fix unable to run encoder model when disable_hybrid_kv_cache_manager is true by @lianyiibo in #24571
[Misc] Add removed encoder-decoder models to previously supported models list by @Isotr0py in #24961
Directly get max encoder len from VLLM config in V1 by @Sugar-zsg in #24866
[gpt-oss][1b] streaming add item id, content id by @qandrew in #24788
[MISC] Add code owners of vllm/v1 to vllm/v1/core by @heheda12345 in #24928
[ROCm] Add dependencies for ROCm by @Concurrensee in #24900
[gpt-oss][1][bugfix] fix streaming final output by @qandrew in #24466
Use kwargs for long lists of EngineCoreRequest arguments in tests and fix extra kwargs by @qthequartermasterman in #24987
fp8 kv cache support fix for torch.compile by @maleksan85 in #22758
[Perf] Reuse workspace for FP8+FP4 Marlin MoE by @mgoin in #20500
[CI][Bugfix] Fix failing Blackwell test by @MatthewBonanni in #24993
[CI] GPT-OSS GPQA eval test for Blackwell by @mgoin in #24920
[FP8] Extend per-token-group quantization support to QuantFP8 by @tahsintunan in #24342
Removes source compilation of nixl dependency by @bbartels in #24874
[Doc] Add --force-overwrite option to generate_cmake_presets.py by @elvischenv in #24375
[Core] Use CpuGpuBuffer for block table tensors by @njhill in #24795
[Benchmarks] Add MMVU video dataset support and clean up deprecated datasets by @Isotr0py in #24719
[UX] Enforce valid choices for envs like VLLM_ATTENTION_BACKEND, etc by @mgoin in #24761
[Docs] fix invalid doc link by @yyzxw in #25017
[UX] Remove "quantization is not fully optimized yet" log by @mgoin in #25012
[misc] fix typo in value error by @prashantgupta24 in #24995
[Core] Get num_encoder_tokens from scheduler config by @russellb in #24989
[V0 Deprecation] Remove MQLLMEngine by @WoosukKwon in #25019
[Model] Support Qwen3-VL Model Series by @ywang96 in #24727
[Rocm] [quantization] Fix quark ptpc moe and add test case by @haoyangli-amd in #24649
Add more documentation and improve usability of lognormal dist (benchmark_serving_multi_turn) by @pliops-daniels in #23255
[XPU] Fix xpu model runner call torch.cuda APIs by @jikunshang in #25011
[EPLB] Support EPLB for Mixtral Model by @rouchenzi in #22842
[Core][MultiModalHasher] Hash images without converting image mode by @lgeiger in #24969
[Model] Pass param prefix to LLMHead by @whx-sjtu in #24862
[Model] Apply SharedFusedMoE to glm4_moe. by @whx-sjtu in #24849
[Core] Remove tokenizer group in vLLM by @zhuohan123 in #24078
[Docs] Fix griffe warning in base_static_graph.py by @windsonsea in #25018
[DP] Create placement groups by ray_device_key by @xinyu-intel in #25026
[Frontend] Support returning all prompt logprobs by @chaunceyjiang in #24956
[BugFix] enable DOTALL to match multi-line tool_call parameters in extract_tool_call_required_streaming by @shijun-yin in #24668
[Misc] Avoid use of deprecated AutoModelForVision2Seq by @DarkLight1337 in #25065
Add RADIO Vision Encoder Support to vLLM by @danielafrimi in #24595
[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check by @bigPYJ1151 in #25046
Apply fixes for CUDA 13 by @Aidyn-A in #24599
[fix] lora benchmarks pass no_lora_flag_cpu by @dolpm in #23774
[Bugfix][Qwen3-Next] fixes the varlen issue in qwen3-next's MTP implementation. by @sighingnow in #24957
[Docs] improve code formatting and comments for eliminate griffe build warning. by @samzong in #25010
Remove old cutlass mla by @MatthewBonanni in #23961
[Docs] vllm/benchmarks/datasets.py fix docstring param format. by @samzong in #24970
[CI Bugfix] Fix failing test_invalid_env by @mgoin in #25078
[V0 Deprecation] Remove V0 Core tests by @WoosukKwon in #25082
cleanup: remove adapter commons by @simon-mo in #25045
Remove unused find_cuda_init helper script by @simon-mo in #25044
[V0 Deprecation] Remove unused output processor util by @WoosukKwon in #25023
Change log level from info to debug for IOProcessor by @mgoin in #24999
[CI] Revert back prepare_prompts and check_answers by @WoosukKwon in #25087
[V0 Deprecation] Remove V0 tests in test_sequence.py by @WoosukKwon in #25088
[CI Bugfix] Fix failing test_model_load_with_params tests due to tokenizer refactor by @mgoin in #25086
[V1] Logits processor docs by @afeldman-nm in #22919
[Misc] Update owners for KV connector and V1 offloading by @ApostaC in #25041
[Bugfix] Update import path for bc_linter_include by @mmangkad in #24766
[BUG] Exclude .pth files when pulling remote files by @ahao-anyscale in #25092
[Kernel] Faster pre-processing time for W4A8 by @czhu-cohere in #23972
[gpt-oss][2] fix types for streaming by @qandrew in #24556
[Bugfix][B200] Fix cutlass_mla hang by @alexm-redhat in #24966
[ROCm][Bugfix] Aiter mha fp8 fix by @dllehr-amd in #24991
Disable failing GPT-OSS Eval (Blackwell) for now by @mgoin in #25107
[Bugfix] Refactor Flashinfer TRTLLM attention kernel selection logic by @elvischenv in #24600
Add a batched auto tune script by @karan in #25076
[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel by @elvischenv in #24833
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses by @bnellnm in #22537
[V0 Deprecation] Remove V0 Engine tests by @WoosukKwon in #25114
[V0 Deprecation] Remove V0 Tracing & Metrics tests by @WoosukKwon in #25115
[V0 Deprecation] Remove misc V0 tests by @WoosukKwon in #25118
[V0 Deprecation] Skip PP test by @WoosukKwon in #25128
[Kernels] Enable DeepGEMM by default by @bnellnm in #24462
[MM Encoder] Apply DP ViT for Qwen3-VL model series by @ywang96 in #24955
[Docs] Clean up the contributing README by @hmellor in #25099
[Core][MM] Cleanup MultiModalCache by @lgeiger in #25006
[Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and mlp in qwen2moe to successfully load ignored params in quantized models by @toncao in #24960
[Kernels] Overlap shared experts with combine instead of dispatch by @bnellnm in #24254
[Model] enable data parallel for InternVL vision encoder by @666even666 in #23909
Mark prompt logprobs as incompatible with prompt embeds at API level by @qthequartermasterman in #25077
[XPU] Whisper model support on XPU Platform by @chaojun-zhang in #25123
[EPLB] Add EPLB support for hunyuan_v1 by @666even666 in #23078
[V0 Deprecation] Remove more V0 tests by @WoosukKwon in #25117
[Spec Decode] Efficient padded speculation by @benchislett in #24539
[benchmark] add peak throughput metrics and plot by @simon-mo in #23867
[CLI] Use streaming in CLI chat and completion commands by @simon-mo in #23769
[Kernel] Better inf handling for grouped topk cu by @lumina37 in #24886
[Docs] Fix API Reference by @hmellor in #25140
Retrieve sliding_window from text config in Gemma3 MM by @hmellor in #25085
[Bugfix] when use s3 model cannot use default load_format by @lengrongfu in #24435
[Qwen] Add fp8 checkpoint support for qwen3-next. by @sighingnow in #25079
Add 'path' option to ImagePrompt data_format by @gfinol in [#25081](https://redirect.github.com/vllm-project/v

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Enabled.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot.

| datasource | package | from | to | | ---------- | ------- | ----- | ------ | | pypi | vllm | 0.5.5 | 0.11.0 |

dreadnode-renovate-bot bot requested a review from a team as a code owner August 21, 2025 20:04

dreadnode-renovate-bot bot added type/digest Dependency digest updates area/python Changes to Python package configuration and dependencies labels Aug 21, 2025

fix(deps): update dependency vllm to ^0.11.0

df6a2e8

| datasource | package | from | to | | ---------- | ------- | ----- | ------ | | pypi | vllm | 0.5.5 | 0.11.0 |

dreadnode-renovate-bot bot force-pushed the renovate/vllm-0.x branch from b262dd5 to df6a2e8 Compare October 4, 2025 20:05

dreadnode-renovate-bot bot changed the title ~~fix(deps): update dependency vllm to ^0.10.0~~ fix(deps): update dependency vllm to ^0.11.0 Oct 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(deps): update dependency vllm to ^0.11.0 #257

fix(deps): update dependency vllm to ^0.11.0 #257

Uh oh!

dreadnode-renovate-bot bot commented Aug 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

fix(deps): update dependency vllm to ^0.11.0 #257

Are you sure you want to change the base?

fix(deps): update dependency vllm to ^0.11.0 #257

Uh oh!

Conversation

dreadnode-renovate-bot bot commented Aug 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

| Package | Change | Age | Confidence | |

Generated Summary

Release Notes

v0.11.0

Highlights

Model Support

Engine Core

Hardware & Performance

Large Scale Serving & Performance

Quantization

API & Frontend

Security

Dependencies

V0 Deprecation

What's Changed

Configuration

Uh oh!

Uh oh!

dreadnode-renovate-bot bot commented Aug 21, 2025 •

edited by github-actions bot

Loading

| Package | Change | Age | Confidence |
|

`v0.11.0`