Skip to content

Commit 5d82a95

Browse files
committed
added ack and image
1 parent 4b7e854 commit 5d82a95

File tree

2 files changed

+6
-1
lines changed

2 files changed

+6
-1
lines changed

_posts/2025-10-26-zero_reload_model_switching_with_vllm_sleep_mode.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22
layout: post
33
title: "Zero-Reload Model Switching with vLLM Sleep Mode"
44
author: "Embedded LLM"
5+
image: /assets/figures/2025-vllm-sleep-mode/sleepmode.png
6+
thumbnail-img: /assets/figures/2025-vllm-sleep-mode/sleepmode.png
7+
share-img: /assets/figures/2025-vllm-sleep-mode/sleepmode.png
58
---
69

710
## Introduction
@@ -11,6 +14,8 @@ author: "Embedded LLM"
1114
1. **Keep both models loaded** → Requires 2x the GPU memory (expensive, often impossible)
1215
2. **Reload models on-demand** → 30-100+ seconds per switch (slow, wasteful)
1316

17+
![vLLM Sleep Mode](/assets/figures/2025-vllm-sleep-mode/sleepmode.png)
18+
1419
**vLLM Sleep Mode offers a third way:** Models hibernate in seconds and wake up fast—delivering the efficiency of on-demand loading with the speed of persistent serving.
1520

1621
### Two Sleep Levels for Different Needs
@@ -460,4 +465,4 @@ The future of LLM serving is multi-model. Sleep Mode makes it practical today.
460465

461466
## Acknowledgements
462467

463-
Special thanks to **Vensen Mu**, **Jeff Aw**, **Jun Kang Chow**, **Tun Jian Tan**, **Pin Siang Tan**, **Amir Balwel**, **Ye Hur Cheong** and **Kaichao You** for developing the Sleep Mode feature and inspiring this blog post.
468+
Special thanks to **Vensen Mu**, **Jeff Aw**, **Jun Kang Chow**, **Tun Jian Tan**, **Pin Siang Tan**, **Amir Balwel**, **Ye Hur Cheong** and **Zhiyao Cen**, **Kaichao You** for developing the Sleep Mode feature and inspiring this blog post.
380 KB
Loading

0 commit comments

Comments
 (0)