diff --git a/docs/01-getting-started/01-installation/01-gpu.md b/docs/01-getting-started/01-installation/01-gpu.md index d81d54f..5e9cd6a 100644 --- a/docs/01-getting-started/01-installation/01-gpu.md +++ b/docs/01-getting-started/01-installation/01-gpu.md @@ -2,7 +2,7 @@ title: GPU --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 是一个支持如下 GPU 类型的 Python 库,根据您的 GPU 型号查看相应的说明。 diff --git a/docs/01-getting-started/01-installation/02-cpu.md b/docs/01-getting-started/01-installation/02-cpu.md index 0ed885a..53bb7cf 100644 --- a/docs/01-getting-started/01-installation/02-cpu.md +++ b/docs/01-getting-started/01-installation/02-cpu.md @@ -2,7 +2,7 @@ title: CPU --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 是一个支持以下 CPU 变体的 Python 库。根据您的 CPU 类型查看厂商特定的说明: @@ -11,7 +11,6 @@ vLLM 是一个支持以下 CPU 变体的 Python 库。根据您的 CPU 类型查 vLLM 初步支持在 x86 CPU 平台进行基础模型推理和服务,支持 FP32、FP16 和 BF16 数据类型。 > **注意** -> > 此设备没有预编译的 wheel 包或镜像,您必须从源码构建 vLLM。 #### ARM AArch64 @@ -21,7 +20,6 @@ vLLM 已适配支持具备 NEON 指令集的 ARM64 CPU,基于最初为 x86 平 ARM CPU 后端当前支持 Float32、FP16 和 BFloat16 数据类型。 > **注意** -> > 此设备没有预编译的 wheel 包或镜像,您必须从源码构建 vLLM。 #### Apple silicon @@ -31,7 +29,6 @@ vLLM 对 macOS 上的 Apple 芯片提供实验性支持。目前用户需从源 macOS 的 CPU 实现当前支持 FP32 和 FP16 数据类型。 > **注意** -> > 此设备没有预编译的 wheel 包或镜像,您必须从源码构建 vLLM。 #### IBM Z (S390X) @@ -41,7 +38,6 @@ vLLM 对 IBM Z 平台上的 s390x 架构提供实验性支持。目前用户需 s390x 架构的 CPU 实现当前仅支持 FP32 数据类型。 > **注意** -> > 此设备没有预编译的 wheel 包或镜像,您必须从源码构建 vLLM。 ## 系统要求 @@ -54,7 +50,8 @@ s390x 架构的 CPU 实现当前仅支持 FP32 数据类型。 - 编译器:`gcc/g++ >= 12.3.0`(可选,推荐) - 指令集架构 (ISA):AVX512(可选,推荐) -> **提示** >[Intel Extension for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch)  通过最新特性优化扩展 PyTorch,可在 Intel 硬件上获得额外性能提升。 +> **提示** +>[Intel Extension for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch)  通过最新特性优化扩展 PyTorch,可在 Intel 硬件上获得额外性能提升。 #### ARM AArch64 @@ -270,10 +267,10 @@ $ docker run -it \ vLLM CPU 后端支持以下特性: -- 张量并行 (Tensor Parallel) -- 模型量化 (`INT8 W8A8`、`AWQ`、`GPTQ`) -- 分块预填充 (Chunked-prefill -- 前缀缓存 (Prefix-caching) +- 张量并行(Tensor Parallel) +- 模型量化(`INT8 W8A8`、`AWQ`、`GPTQ`) +- 分块预填充(Chunked-prefill) +- 前缀缓存(Prefix-caching) - FP8-E5M2 KV 缓存 ## 相关运行时环境变量 @@ -285,7 +282,7 @@ vLLM CPU 后端支持以下特性: `VLLM_CPU_OMP_THREADS_BIND=0-31|32-63`  表示启用 2 个张量并行进程,rank0 的 32 个 OpenMP 线程绑定到 0-31 号核心,rank1 的线程绑定到 32-63 号核心 -- `VLLM_CPU_MOE_PREPACK` : 是否为 MoE 层使用预打包功能。该参数会传递给  `ipex.llm.modules.GatedMLPMOE` 。默认值为  `1` (启用)。在不支持的 CPU 上可能需要设置为  `0` (禁用)。 +- `VLLM_CPU_MOE_PREPACK` : 是否为 MoE 层使用预打包功能。该参数会传递给 `ipex.llm.modules.GatedMLPMOE`。默认值为 `1`(启用)。在不支持的 CPU 上可能需要设置为 `0`(禁用)。 ## 性能优化建议 @@ -306,7 +303,7 @@ export VLLM_CPU_OMP_THREADS_BIND=0-29 vllm serve facebook/opt-125m ``` -- 在支持超线程的机器上使用 vLLM CPU 后端时,建议通过  `VLLM_CPU_OMP_THREADS_BIND`  将每个物理 CPU 核心只绑定一个 OpenMP 线程。在 16 逻辑核心 / 8 物理核心的超线程平台上: +- 在支持超线程的机器上使用 vLLM CPU 后端时,建议通过 `VLLM_CPU_OMP_THREADS_BIND` 将每个物理 CPU 核心只绑定一个 OpenMP 线程。在 16 逻辑核心/8 物理核心的超线程平台上: ```plain $ lscpu -e # check the mapping between logical CPU cores and physical CPU cores @@ -337,13 +334,13 @@ $ export VLLM_CPU_OMP_THREADS_BIND=0-7 $ python examples/offline_inference/basic/basic.py ``` -- 在多插槽 NUMA 机器上使用 vLLM CPU 后端时,应注意通过  `VLLM_CPU_OMP_THREADS_BIND`  设置 CPU 核心,避免跨 NUMA 节点的内存访问。 +- 在多插槽 NUMA 机器上使用 vLLM CPU 后端时,应注意通过 `VLLM_CPU_OMP_THREADS_BIND` 设置 CPU 核心,避免跨 NUMA 节点的内存访问。 ## 其他注意事项 - CPU 后端与 GPU 后端有显著差异,因为 vLLM 架构最初是为 GPU 优化的。需要多项优化来提升其性能。 - 建议将 HTTP 服务组件与推理组件解耦。在 GPU 后端配置中,HTTP 服务和分词任务运行在 CPU 上,而推理运行在 GPU 上,这通常不会造成问题。但在基于 CPU 的环境中,HTTP 服务和分词可能导致显著的上下文切换和缓存效率降低。因此强烈建议分离这两个组件以获得更好的性能。 -- 在启用 NUMA 的 CPU 环境中,内存访问性能可能受  [拓扑结构](https://github.com/intel/intel-extension-for-pytorch/blob/main/docs/tutorials/performance_tuning/tuning_guide.inc.md#non-uniform-memory-access-numa)  影响较大。对于 NUMA 架构,推荐两种优化方案:张量并行或数据并行。 +- 在启用 NUMA 的 CPU 环境中,内存访问性能可能受[拓扑结构](https://github.com/intel/intel-extension-for-pytorch/blob/main/docs/tutorials/performance_tuning/tuning_guide.inc.md#non-uniform-memory-access-numa)影响较大。对于 NUMA 架构,推荐两种优化方案:张量并行或数据并行。 - 延迟敏感场景使用张量并行:遵循 GPU 后端设计,基于 NUMA 节点数量(例如双 NUMA 节点系统 TP=2)使用 Megatron-LM 的并行算法切分模型。随着  [CPU 上的 TP 功能](https://github.com/vllm-project/vllm/pull/6125#)  合并,张量并行已支持服务和离线推理。通常每个 NUMA 节点被视为一个 GPU 卡。以下是启用张量并行度为 2 的服务示例: diff --git a/docs/01-getting-started/01-installation/03-ai-accelerator.md b/docs/01-getting-started/01-installation/03-ai-accelerator.md index 0a125fe..331b960 100644 --- a/docs/01-getting-started/01-installation/03-ai-accelerator.md +++ b/docs/01-getting-started/01-installation/03-ai-accelerator.md @@ -2,7 +2,7 @@ title: 其他 AI 加速器 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 是一个 Python 库,支持以下 AI 加速器。根据您的 AI 加速器类型查看供应商特定说明: @@ -29,7 +29,6 @@ vLLM 是一个 Python 库,支持以下 AI 加速器。根据您的 AI 加速 您可能需要为 TPU 虚拟机提供额外的持久存储。更多信息请参阅 [Cloud TPU 数据存储选项](https://cloud.devsite.corp.google.com/tpu/docs/storage-options)。 > **注意** -> > 此设备没有预构建的 wheels,因此您必须使用预构建的 Docker 镜像或从源代码构建 vLLM。 #### Intel Gaudi @@ -37,7 +36,6 @@ vLLM 是一个 Python 库,支持以下 AI 加速器。根据您的 AI 加速 此节提供了在 Intel Gaudi 设备上运行 vLLM 的说明。 > **注意** -> > 此设备没有预构建的 wheels 或镜像,因此您必须从源代码构建 vLLM。 #### AWS Neuron @@ -45,7 +43,6 @@ vLLM 是一个 Python 库,支持以下 AI 加速器。根据您的 AI 加速 vLLM 0.3.3 及以上版本支持通过 Neuron SDK 在 AWS Trainium/Inferentia 上进行模型推理和服务,并支持连续批处理。分页注意力 (Paged Attention) 和分块预填充 (Chunked Prefill) 功能目前正在开发中,即将推出。Neuron SDK 当前支持的数据类型为 FP16 和 BF16。 > **注意** -> > 此设备没有预构建的 wheels 或镜像,因此您必须从源代码构建 vLLM。 ## 环境要求 @@ -61,7 +58,6 @@ vLLM 0.3.3 及以上版本支持通过 Neuron SDK 在 AWS Trainium/Inferentia 您可以使用 [Cloud TPU API](https://cloud.google.com/tpu/docs/reference/rest) 或 [队列资源](https://cloud.google.com/tpu/docs/queued-resources) API 配置 Cloud TPU。本节展示如何使用队列资源 API 创建 TPU。有关使用 Cloud TPU API 的更多信息,请参阅 [使用 Create Node API 创建 Cloud TPU](https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api)。队列资源允许您以队列方式请求 Cloud TPU 资源。当您请求队列资源时,请求会被添加到 Cloud TPU 服务维护的队列中。当请求的资源可用时,它将分配给您的 Google Cloud 项目供您独占使用。 > **注意** -> > 在以下所有命令中,请将全大写的参数名称替换为适当的值。有关参数描述,请参阅参数描述表。 #### 使用 GKE 配置 Cloud TPU diff --git a/docs/01-getting-started/01-installation/README.md b/docs/01-getting-started/01-installation/README.md index 0d8f771..e1dbdc7 100644 --- a/docs/01-getting-started/01-installation/README.md +++ b/docs/01-getting-started/01-installation/README.md @@ -2,25 +2,25 @@ title: 安装 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 支持以下硬件平台: -## GPU +## [GPU](/docs/getting-started/installation/gpu) -- NVIDIA CUDA -- AMD ROCm -- Intel XPU +- [NVIDIA CUDA](/docs/getting-started/installation/gpu#nvidia-cuda) +- [AMD ROCm](/docs/getting-started/installation/gpu#amd-rocm) +- [Intel XPU](/docs/getting-started/installation/gpu#inter-xpu-1) -## CPU +## [CPU](/docs/getting-started/installation/cpu) -- Intel/AMD x86 -- ARM AArch64 -- Apple silicon +- [Intel/AMD x86](/docs/getting-started/installation/cpu#intelamd-x86) +- [ARM AArch64](/docs/getting-started/installation/cpu#arm-aarch64) +- [Apple silicon](/docs/getting-started/installation/cpu#apple-silicon) -## 其他 AI 加速器 +## [其他 AI 加速器](/docs/getting-started/installation/ai-accelerator) -- Google TPU -- Intel Gaudi -- AWS Neuron +- [Google TPU](/docs/getting-started/installation/ai-accelerator#google-tpu-1) +- [Intel Gaudi](/docs/getting-started/installation/ai-accelerator#intel-gaudi-1) +- [AWS Neuron](/docs/getting-started/installation/ai-accelerator#aws-neuron-1) - OpenVINO diff --git a/docs/01-getting-started/02-quickstart.md b/docs/01-getting-started/02-quickstart.md index 0dd8ede..f61be6c 100644 --- a/docs/01-getting-started/02-quickstart.md +++ b/docs/01-getting-started/02-quickstart.md @@ -2,7 +2,7 @@ title: 快速开始 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本指南将帮助您快速开始使用 vLLM 进行以下操作: diff --git a/docs/01-getting-started/03-examples/01-offline-inference/01-audio_language.md b/docs/01-getting-started/03-examples/01-offline-inference/01-audio_language.md index 543b31a..12d9406 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/01-audio_language.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/01-audio_language.md @@ -2,7 +2,7 @@ title: Audio Language --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/audio_language.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/audio_language.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/02-basic.md b/docs/01-getting-started/03-examples/01-offline-inference/02-basic.md index 29dec30..b28aa62 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/02-basic.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/02-basic.md @@ -2,7 +2,7 @@ title: 基础指南 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/basic](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/basic) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/03-chat_with_tools.md b/docs/01-getting-started/03-examples/01-offline-inference/03-chat_with_tools.md index f934aa9..8835f18 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/03-chat_with_tools.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/03-chat_with_tools.md @@ -2,7 +2,7 @@ title: Chat With Tools --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/chat_with_tools.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/chat_with_tools.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/04-cpu_offload_lmcache.md b/docs/01-getting-started/03-examples/01-offline-inference/04-cpu_offload_lmcache.md index 6f5883e..37bd30d 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/04-cpu_offload_lmcache.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/04-cpu_offload_lmcache.md @@ -2,7 +2,7 @@ title: Cpu Offload Lmcache --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/cpu_offload_lmcache.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/cpu_offload_lmcache.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/05-data_parallel.md b/docs/01-getting-started/03-examples/01-offline-inference/05-data_parallel.md index cc24d58..058c6ee 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/05-data_parallel.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/05-data_parallel.md @@ -2,7 +2,7 @@ title: Data Parallel --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/data_parallel.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/data_parallel.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/06-disaggregated_prefill_lmcache.md b/docs/01-getting-started/03-examples/01-offline-inference/06-disaggregated_prefill_lmcache.md index a4f6aeb..3166cf4 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/06-disaggregated_prefill_lmcache.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/06-disaggregated_prefill_lmcache.md @@ -2,7 +2,7 @@ title: Disaggregated Prefill Lmcache --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/disaggregated_prefill_lmcache.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/disaggregated_prefill_lmcache.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/07-disaggregated_prefill.md b/docs/01-getting-started/03-examples/01-offline-inference/07-disaggregated_prefill.md index bab78ac..b039a3f 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/07-disaggregated_prefill.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/07-disaggregated_prefill.md @@ -2,7 +2,7 @@ title: Disaggregated Prefill --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/disaggregated_prefill.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/disaggregated_prefill.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/08-distributed.md b/docs/01-getting-started/03-examples/01-offline-inference/08-distributed.md index d0a0e8b..e0fbc32 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/08-distributed.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/08-distributed.md @@ -2,7 +2,7 @@ title: Distributed --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/distributed.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/distributed.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/09-eagle.md b/docs/01-getting-started/03-examples/01-offline-inference/09-eagle.md index af0442c..1d23757 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/09-eagle.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/09-eagle.md @@ -2,7 +2,7 @@ title: Eagle --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/eagle.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/eagle.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/10-encoder_decoder_multimodal.md b/docs/01-getting-started/03-examples/01-offline-inference/10-encoder_decoder_multimodal.md index 8d8ae2c..62767ec 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/10-encoder_decoder_multimodal.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/10-encoder_decoder_multimodal.md @@ -2,7 +2,7 @@ title: Encoder Decoder Multimodal --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/encoder_decoder_multimodal.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/encoder_decoder_multimodal.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/11-encoder_decoder.md b/docs/01-getting-started/03-examples/01-offline-inference/11-encoder_decoder.md index 2b2fb68..2811f30 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/11-encoder_decoder.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/11-encoder_decoder.md @@ -2,7 +2,7 @@ title: Encoder Decoder --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/encoder_decoder.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/encoder_decoder.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/12-llm_engine_example.md b/docs/01-getting-started/03-examples/01-offline-inference/12-llm_engine_example.md index 3d64193..37de44b 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/12-llm_engine_example.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/12-llm_engine_example.md @@ -2,7 +2,7 @@ title: Llm Engine Example --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/llm_engine_example.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/llm_engine_example.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/13-load_sharded_state.md b/docs/01-getting-started/03-examples/01-offline-inference/13-load_sharded_state.md index e699fad..a3f3bb9 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/13-load_sharded_state.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/13-load_sharded_state.md @@ -2,7 +2,7 @@ title: Load Sharded State --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/load_sharded_state.py.](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/load_sharded_state.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/14-lora_with_quantization_inference.md b/docs/01-getting-started/03-examples/01-offline-inference/14-lora_with_quantization_inference.md index 0c95742..4c19c1a 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/14-lora_with_quantization_inference.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/14-lora_with_quantization_inference.md @@ -2,7 +2,7 @@ title: Lora With Quantization Inference --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/lora_with_quantization_inference.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/lora_with_quantization_inference.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/15-mistral-small.md b/docs/01-getting-started/03-examples/01-offline-inference/15-mistral-small.md index 242fbc2..c1176e8 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/15-mistral-small.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/15-mistral-small.md @@ -2,7 +2,7 @@ title: Mistral-small --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/mistral-small.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/mistral-small.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/16-mlpspeculator.md b/docs/01-getting-started/03-examples/01-offline-inference/16-mlpspeculator.md index da0605d..6be8c86 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/16-mlpspeculator.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/16-mlpspeculator.md @@ -2,7 +2,7 @@ title: Mlpspeculator --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/mlpspeculator.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/mlpspeculator.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/17-multilora_inference.md b/docs/01-getting-started/03-examples/01-offline-inference/17-multilora_inference.md index ce3e4d6..78bda72 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/17-multilora_inference.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/17-multilora_inference.md @@ -2,7 +2,7 @@ title: Multilora Inference --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/multilora_inference.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/multilora_inference.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/18-neuron_int8_quantization.md b/docs/01-getting-started/03-examples/01-offline-inference/18-neuron_int8_quantization.md index ca186f7..27bd214 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/18-neuron_int8_quantization.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/18-neuron_int8_quantization.md @@ -2,7 +2,7 @@ title: Neuron Int8 Quantization --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/neuron_int8_quantization.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/neuron_int8_quantization.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/19-neuron.md b/docs/01-getting-started/03-examples/01-offline-inference/19-neuron.md index fcaabcf..de99d92 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/19-neuron.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/19-neuron.md @@ -2,7 +2,7 @@ title: Neuron --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/neuron.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/neuron.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/20-openai_batch.md b/docs/01-getting-started/03-examples/01-offline-inference/20-openai_batch.md index ef81fb9..7ce92e6 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/20-openai_batch.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/20-openai_batch.md @@ -2,7 +2,7 @@ title: 使用 OpenAI 批处理文件格式进行离线推理 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/openai](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/openai) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/21-prefix_caching.md b/docs/01-getting-started/03-examples/01-offline-inference/21-prefix_caching.md index f5f3cd3..a52f79f 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/21-prefix_caching.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/21-prefix_caching.md @@ -2,7 +2,7 @@ title: Prefix Caching --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/prefix_caching.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/prefix_caching.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/23-profiling_tpu.md b/docs/01-getting-started/03-examples/01-offline-inference/23-profiling_tpu.md index 95d6c6d..19327af 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/23-profiling_tpu.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/23-profiling_tpu.md @@ -2,7 +2,7 @@ title: vLLM TPU 分析 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/profiling_tpu](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/profiling_tpu) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/24-profiling.md b/docs/01-getting-started/03-examples/01-offline-inference/24-profiling.md index 4d1997a..1e2bb1b 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/24-profiling.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/24-profiling.md @@ -2,7 +2,7 @@ title: Profiling --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/profiling.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/profiling.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/25-reproduciblity.md b/docs/01-getting-started/03-examples/01-offline-inference/25-reproduciblity.md index d9b3c52..c595e2a 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/25-reproduciblity.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/25-reproduciblity.md @@ -2,7 +2,7 @@ title: Reproduciblity --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/reproduciblity.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/reproduciblity.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/26-rlhf.md b/docs/01-getting-started/03-examples/01-offline-inference/26-rlhf.md index 35b7c50..5dd5b5e 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/26-rlhf.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/26-rlhf.md @@ -2,7 +2,7 @@ title: Rlhf --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/rlhf.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/rlhf.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/27-rlhf_colocate.md b/docs/01-getting-started/03-examples/01-offline-inference/27-rlhf_colocate.md index ca99fb5..4dd1494 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/27-rlhf_colocate.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/27-rlhf_colocate.md @@ -2,7 +2,7 @@ title: Rlhf Colocate --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/rlhf_colocate.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/rlhf_colocate.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/28-rlhf_utils.md b/docs/01-getting-started/03-examples/01-offline-inference/28-rlhf_utils.md index fd16f8f..63e694b 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/28-rlhf_utils.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/28-rlhf_utils.md @@ -2,7 +2,7 @@ title: Rlhf Utils --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/rlhf_utils.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/rlhf_utils.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/29-save_sharded_state.md b/docs/01-getting-started/03-examples/01-offline-inference/29-save_sharded_state.md index 635a8fd..4e59ead 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/29-save_sharded_state.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/29-save_sharded_state.md @@ -2,7 +2,7 @@ title: Save Sharded State --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/save_sharded_state.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/save_sharded_state.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/30-simple_profiling.md b/docs/01-getting-started/03-examples/01-offline-inference/30-simple_profiling.md index 7ae65b4..81cf40f 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/30-simple_profiling.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/30-simple_profiling.md @@ -2,7 +2,7 @@ title: Simple Profiling --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/simple_profiling.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/simple_profiling.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/31-structured_outputs.md b/docs/01-getting-started/03-examples/01-offline-inference/31-structured_outputs.md index 9a7459f..a694bc9 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/31-structured_outputs.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/31-structured_outputs.md @@ -2,7 +2,7 @@ title: Structured Outputs --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/structured_outputs.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/structured_outputs.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/32-torchrun_example.md b/docs/01-getting-started/03-examples/01-offline-inference/32-torchrun_example.md index 158bc7b..fb1647d 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/32-torchrun_example.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/32-torchrun_example.md @@ -2,7 +2,7 @@ title: Torchrun Example --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/torchrun_example.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/torchrun_example.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/33-tpu.md b/docs/01-getting-started/03-examples/01-offline-inference/33-tpu.md index 4088149..40b7121 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/33-tpu.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/33-tpu.md @@ -2,7 +2,7 @@ title: Tpu --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/tpu.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/tpu.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/34-vision_language.md b/docs/01-getting-started/03-examples/01-offline-inference/34-vision_language.md index 9dd7cd0..f82c105 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/34-vision_language.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/34-vision_language.md @@ -2,7 +2,7 @@ title: Vision Language --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/vision_language.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/vision_language.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/35-vision_language_embedding.md b/docs/01-getting-started/03-examples/01-offline-inference/35-vision_language_embedding.md index 11f22f2..72cb8fb 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/35-vision_language_embedding.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/35-vision_language_embedding.md @@ -2,7 +2,7 @@ title: Vision Language Embedding --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/vision_language_embedding.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/vision_language_embedding.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/36-vision_language_multi_image.md b/docs/01-getting-started/03-examples/01-offline-inference/36-vision_language_multi_image.md index 32aea85..54c6bb9 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/36-vision_language_multi_image.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/36-vision_language_multi_image.md @@ -2,7 +2,7 @@ title: Vision Language Multi Image --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/offline_inference/vision_language_multi_image.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/vision_language_multi_image.py) diff --git a/docs/01-getting-started/03-examples/01-offline-inference/README.md b/docs/01-getting-started/03-examples/01-offline-inference/README.md index c35e3e9..191e034 100644 --- a/docs/01-getting-started/03-examples/01-offline-inference/README.md +++ b/docs/01-getting-started/03-examples/01-offline-inference/README.md @@ -2,45 +2,45 @@ title: 离线推理 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 离线推理示例展示了如何在离线环境中使用 vLLM,以批量方式查询模型进行预测。我们建议从 [Basic](https://docs.vllm.ai/en/latest/getting_started/examples/basic.html) 开始。 示例 -- [Audio Language](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/audio_language) -- [Basic](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/basic) -- [Chat With Tools](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/chat_with_tools) -- [CPU Offload Lmcache](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/cpu_offload_lmcache) -- [Data Parallel](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/data_parallel) -- [分解预填充](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/disaggregated_prefill) -- [Lmcache 分解预填充](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/disaggregated_prefill_lmcache) -- [Distributed](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/distributed) -- [Eagle](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/eagle) -- [Encoder Decoder](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/encoder_decoder) -- [Encoder Decoder Multimodal](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/encoder_decoder_multimodal) -- [LLM Engine Example](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/llm_engine_example) -- [Load Sharded State](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/load_sharded_state) -- [LoRA With Quantization Inference](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/lora_with_quantization_inference) -- [Mistral-Small](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/mistral-small) -- [MLPSpeculator](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/mlpspeculator) -- [MultiLoRA Inference](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/multilora_inference) -- [Neuron](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/neuron) -- [Neuron INT8 Quantization](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/neuron_int8_quantization) -- [使用 OpenAI 批处理文件格式进行离线推理](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/openai_batch) -- [Prefix Caching](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/prefix_caching) -- [Prithvi Geospatial Mae](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/prithvi_geospatial_mae) -- [Profiling](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/profiling) -- [vLLM TPU Profiling](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/profiling_tpu) -- [Reproduciblity](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/reproduciblity) -- [RLHF](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/rlhf) -- [RLHF Colocate](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/rlhf_colocate) -- [RLHF Utils](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/rlhf_utils) -- [Save Sharded State](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/save_sharded_state) -- [Simple Profiling](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/simple_profiling) -- [Structured Outputs](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/structured_outputs) -- [Torchrun Example](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/torchrun_example) -- [TPU](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/tpu) -- [Vision Language](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/vision_language) -- [Vision Language Embedding](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/vision_language_embedding) -- [Vision Language Multi Image](https://vllm.hyper.ai/docs/getting-started/examples/offline-inference/vision_language_multi_image) +- [Audio Language](/docs/getting-started/examples/offline-inference/audio_language) +- [Basic](/docs/getting-started/examples/offline-inference/basic) +- [Chat With Tools](/docs/getting-started/examples/offline-inference/chat_with_tools) +- [CPU Offload Lmcache](/docs/getting-started/examples/offline-inference/cpu_offload_lmcache) +- [Data Parallel](/docs/getting-started/examples/offline-inference/data_parallel) +- [分解预填充](/docs/getting-started/examples/offline-inference/disaggregated_prefill) +- [Lmcache 分解预填充](/docs/getting-started/examples/offline-inference/disaggregated_prefill_lmcache) +- [Distributed](/docs/getting-started/examples/offline-inference/distributed) +- [Eagle](/docs/getting-started/examples/offline-inference/eagle) +- [Encoder Decoder](/docs/getting-started/examples/offline-inference/encoder_decoder) +- [Encoder Decoder Multimodal](/docs/getting-started/examples/offline-inference/encoder_decoder_multimodal) +- [LLM Engine Example](/docs/getting-started/examples/offline-inference/llm_engine_example) +- [Load Sharded State](/docs/getting-started/examples/offline-inference/load_sharded_state) +- [LoRA With Quantization Inference](/docs/getting-started/examples/offline-inference/lora_with_quantization_inference) +- [Mistral-Small](/docs/getting-started/examples/offline-inference/mistral-small) +- [MLPSpeculator](/docs/getting-started/examples/offline-inference/mlpspeculator) +- [MultiLoRA Inference](/docs/getting-started/examples/offline-inference/multilora_inference) +- [Neuron](/docs/getting-started/examples/offline-inference/neuron) +- [Neuron INT8 Quantization](/docs/getting-started/examples/offline-inference/neuron_int8_quantization) +- [使用 OpenAI 批处理文件格式进行离线推理](/docs/getting-started/examples/offline-inference/openai_batch) +- [Prefix Caching](/docs/getting-started/examples/offline-inference/prefix_caching) +- [Prithvi Geospatial Mae](/docs/getting-started/examples/offline-inference/prithvi_geospatial_mae) +- [Profiling](/docs/getting-started/examples/offline-inference/profiling) +- [vLLM TPU Profiling](/docs/getting-started/examples/offline-inference/profiling_tpu) +- [Reproduciblity](/docs/getting-started/examples/offline-inference/reproduciblity) +- [RLHF](/docs/getting-started/examples/offline-inference/rlhf) +- [RLHF Colocate](/docs/getting-started/examples/offline-inference/rlhf_colocate) +- [RLHF Utils](/docs/getting-started/examples/offline-inference/rlhf_utils) +- [Save Sharded State](/docs/getting-started/examples/offline-inference/save_sharded_state) +- [Simple Profiling](/docs/getting-started/examples/offline-inference/simple_profiling) +- [Structured Outputs](/docs/getting-started/examples/offline-inference/structured_outputs) +- [Torchrun Example](/docs/getting-started/examples/offline-inference/torchrun_example) +- [TPU](/docs/getting-started/examples/offline-inference/tpu) +- [Vision Language](/docs/getting-started/examples/offline-inference/vision_language) +- [Vision Language Embedding](/docs/getting-started/examples/offline-inference/vision_language_embedding) +- [Vision Language Multi Image](/docs/getting-started/examples/offline-inference/vision_language_multi_image) diff --git a/docs/01-getting-started/03-examples/02-online-serving/01-api_client.md b/docs/01-getting-started/03-examples/02-online-serving/01-api_client.md index eec483e..c861224 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/01-api_client.md +++ b/docs/01-getting-started/03-examples/02-online-serving/01-api_client.md @@ -2,7 +2,7 @@ title: Api Client --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/api_client.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/api_client.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/02-chart-helm.md b/docs/01-getting-started/03-examples/02-online-serving/02-chart-helm.md index 479c9c8..20e2cc5 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/02-chart-helm.md +++ b/docs/01-getting-started/03-examples/02-online-serving/02-chart-helm.md @@ -2,7 +2,7 @@ title: Helm 图表 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/chart-helm](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/chart-helm) diff --git a/docs/01-getting-started/03-examples/02-online-serving/03-cohere_rerank_client.md b/docs/01-getting-started/03-examples/02-online-serving/03-cohere_rerank_client.md index f3839c3..57d0771 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/03-cohere_rerank_client.md +++ b/docs/01-getting-started/03-examples/02-online-serving/03-cohere_rerank_client.md @@ -2,7 +2,7 @@ title: Cohere Rerank Client --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/cohere_rerank_client.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/cohere_rerank_client.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/04-disaggregated_prefill.md b/docs/01-getting-started/03-examples/02-online-serving/04-disaggregated_prefill.md index 57a696f..cc8034e 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/04-disaggregated_prefill.md +++ b/docs/01-getting-started/03-examples/02-online-serving/04-disaggregated_prefill.md @@ -2,7 +2,7 @@ title: Disaggregated Prefill --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/disaggregated_prefill.sh](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/disaggregated_prefill.sh) diff --git a/docs/01-getting-started/03-examples/02-online-serving/05-gradio_openai_chatbot_webserver.md b/docs/01-getting-started/03-examples/02-online-serving/05-gradio_openai_chatbot_webserver.md index 91e0568..fced162 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/05-gradio_openai_chatbot_webserver.md +++ b/docs/01-getting-started/03-examples/02-online-serving/05-gradio_openai_chatbot_webserver.md @@ -2,7 +2,7 @@ title: Gradio Openai Chatbot Webserver --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/gradio_openai_chatbot_webserver.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/gradio_openai_chatbot_webserver.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/06-gradio_webserver.md b/docs/01-getting-started/03-examples/02-online-serving/06-gradio_webserver.md index 2fca18a..706b7eb 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/06-gradio_webserver.md +++ b/docs/01-getting-started/03-examples/02-online-serving/06-gradio_webserver.md @@ -2,7 +2,7 @@ title: Gradio Webserver --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/gradio_webserver.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/gradio_webserver.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/07-jinaai_rerank_client.md b/docs/01-getting-started/03-examples/02-online-serving/07-jinaai_rerank_client.md index c95b148..a79a3ee 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/07-jinaai_rerank_client.md +++ b/docs/01-getting-started/03-examples/02-online-serving/07-jinaai_rerank_client.md @@ -2,7 +2,7 @@ title: Jinaai Rerank Client --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/jinaai_rerank_client.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/jinaai_rerank_client.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/08-multi-node-serving.md b/docs/01-getting-started/03-examples/02-online-serving/08-multi-node-serving.md index f4fddbe..f56f651 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/08-multi-node-serving.md +++ b/docs/01-getting-started/03-examples/02-online-serving/08-multi-node-serving.md @@ -2,7 +2,7 @@ title: Multi-node-serving --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/multi-node-serving.sh](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/multi-node-serving.sh) diff --git a/docs/01-getting-started/03-examples/02-online-serving/09-openai_chat_completion_client.md b/docs/01-getting-started/03-examples/02-online-serving/09-openai_chat_completion_client.md index 7b5138b..fbc6fa6 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/09-openai_chat_completion_client.md +++ b/docs/01-getting-started/03-examples/02-online-serving/09-openai_chat_completion_client.md @@ -2,7 +2,7 @@ title: Openai Chat Completion Client --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_chat_completion_client.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_client.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/10-openai_chat_completion_client_for_multimodal.md b/docs/01-getting-started/03-examples/02-online-serving/10-openai_chat_completion_client_for_multimodal.md index 5a3c04d..255bfdc 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/10-openai_chat_completion_client_for_multimodal.md +++ b/docs/01-getting-started/03-examples/02-online-serving/10-openai_chat_completion_client_for_multimodal.md @@ -2,7 +2,7 @@ title: Openai Chat Completion Client For Multimodal --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_chat_completion_client_for_multimodal.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_client_for_multimodal.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/11-openai_chat_completion_client_with_tools.md b/docs/01-getting-started/03-examples/02-online-serving/11-openai_chat_completion_client_with_tools.md index a935511..90d7b53 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/11-openai_chat_completion_client_with_tools.md +++ b/docs/01-getting-started/03-examples/02-online-serving/11-openai_chat_completion_client_with_tools.md @@ -2,7 +2,7 @@ title: Openai Chat Completion Client With Tools --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_chat_completion_client_with_tools.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_client_with_tools.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/12-openai_chat_completion_client_with_tools_required.md b/docs/01-getting-started/03-examples/02-online-serving/12-openai_chat_completion_client_with_tools_required.md index d1525ae..7fefdba 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/12-openai_chat_completion_client_with_tools_required.md +++ b/docs/01-getting-started/03-examples/02-online-serving/12-openai_chat_completion_client_with_tools_required.md @@ -2,7 +2,7 @@ title: OpenAI Chat Completion Client With Tools Required --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_chat_completion_client_with_tools_required.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_client_with_tools_required.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/13-openai_chat_completion_structured_outputs.md b/docs/01-getting-started/03-examples/02-online-serving/13-openai_chat_completion_structured_outputs.md index ed70fe8..5f9c46e 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/13-openai_chat_completion_structured_outputs.md +++ b/docs/01-getting-started/03-examples/02-online-serving/13-openai_chat_completion_structured_outputs.md @@ -2,7 +2,7 @@ title: Openai Chat Completion Structured Outputs --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_chat_completion_structured_outputs.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_structured_outputs.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/14-openai_chat_completion_structured_outputs_with_reasoning.md b/docs/01-getting-started/03-examples/02-online-serving/14-openai_chat_completion_structured_outputs_with_reasoning.md index fe82d14..cead708 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/14-openai_chat_completion_structured_outputs_with_reasoning.md +++ b/docs/01-getting-started/03-examples/02-online-serving/14-openai_chat_completion_structured_outputs_with_reasoning.md @@ -2,7 +2,7 @@ title: OpenAI 聊天完成结构化输出与推理 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_chat_completion_structured_outputs_with_reasoning.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_structured_outputs_with_reasoning.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/15-openai_chat_completion_tool_calls_with_reasoning.md b/docs/01-getting-started/03-examples/02-online-serving/15-openai_chat_completion_tool_calls_with_reasoning.md index e0e952d..5750b51 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/15-openai_chat_completion_tool_calls_with_reasoning.md +++ b/docs/01-getting-started/03-examples/02-online-serving/15-openai_chat_completion_tool_calls_with_reasoning.md @@ -2,7 +2,7 @@ title: Openai Chat Completion Tool Calls With Reasoning --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/16-openai_chat_completion_with_reasoning.md b/docs/01-getting-started/03-examples/02-online-serving/16-openai_chat_completion_with_reasoning.md index 7b250b0..cc12c49 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/16-openai_chat_completion_with_reasoning.md +++ b/docs/01-getting-started/03-examples/02-online-serving/16-openai_chat_completion_with_reasoning.md @@ -2,7 +2,7 @@ title: Openai Chat Completion With Reasoning --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_chat_completion_with_reasoning.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/17-openai_chat_completion_with_reasoning_streaming.md b/docs/01-getting-started/03-examples/02-online-serving/17-openai_chat_completion_with_reasoning_streaming.md index 24165e4..19f4da8 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/17-openai_chat_completion_with_reasoning_streaming.md +++ b/docs/01-getting-started/03-examples/02-online-serving/17-openai_chat_completion_with_reasoning_streaming.md @@ -2,7 +2,7 @@ title: 基于推理流的 OpenAI 聊天完成 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_chat_completion_with_reasoning_streaming.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/18-openai_chat_embedding_client_for_multimodal.md b/docs/01-getting-started/03-examples/02-online-serving/18-openai_chat_embedding_client_for_multimodal.md index 740bc95..da837bd 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/18-openai_chat_embedding_client_for_multimodal.md +++ b/docs/01-getting-started/03-examples/02-online-serving/18-openai_chat_embedding_client_for_multimodal.md @@ -2,7 +2,7 @@ title: 嵌入多通道客户端的 OpenAI 聊天工具 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_chat_embedding_client_for_multimodal.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_embedding_client_for_multimodal.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/19-openai_completion_client.md b/docs/01-getting-started/03-examples/02-online-serving/19-openai_completion_client.md index 0306d98..83c24ad 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/19-openai_completion_client.md +++ b/docs/01-getting-started/03-examples/02-online-serving/19-openai_completion_client.md @@ -2,7 +2,7 @@ title: OpenAI 完成客户端 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_completion_client.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_completion_client.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/20-openai_cross_encoder_score.md b/docs/01-getting-started/03-examples/02-online-serving/20-openai_cross_encoder_score.md index e4d2d8a..466d474 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/20-openai_cross_encoder_score.md +++ b/docs/01-getting-started/03-examples/02-online-serving/20-openai_cross_encoder_score.md @@ -2,7 +2,7 @@ title: Openai 交叉编码器得分 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_cross_encoder_score.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_cross_encoder_score.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/21-openai_embedding_client.md b/docs/01-getting-started/03-examples/02-online-serving/21-openai_embedding_client.md index 315485f..823e82e 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/21-openai_embedding_client.md +++ b/docs/01-getting-started/03-examples/02-online-serving/21-openai_embedding_client.md @@ -2,7 +2,7 @@ title: Openai 嵌入式客户端 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_embedding_client.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_embedding_client.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/22-openai_pooling_client.md b/docs/01-getting-started/03-examples/02-online-serving/22-openai_pooling_client.md index 7254da1..caccca5 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/22-openai_pooling_client.md +++ b/docs/01-getting-started/03-examples/02-online-serving/22-openai_pooling_client.md @@ -2,7 +2,7 @@ title: Openai 池化客户端 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_pooling_client.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_pooling_client.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/23-openai_transcription_client.md b/docs/01-getting-started/03-examples/02-online-serving/23-openai_transcription_client.md index dfa07d9..fb13742 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/23-openai_transcription_client.md +++ b/docs/01-getting-started/03-examples/02-online-serving/23-openai_transcription_client.md @@ -2,7 +2,7 @@ title: Openai Transcription 客户端 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/openai_transcription_client.py](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_transcription_client.py) diff --git a/docs/01-getting-started/03-examples/02-online-serving/24-opentelemetry.md b/docs/01-getting-started/03-examples/02-online-serving/24-opentelemetry.md index 8b74120..5ea9f3a 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/24-opentelemetry.md +++ b/docs/01-getting-started/03-examples/02-online-serving/24-opentelemetry.md @@ -2,7 +2,7 @@ title: OpenTelemetry POC 设置 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/opentelemetry](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/opentelemetry) diff --git a/docs/01-getting-started/03-examples/02-online-serving/25-prometheus_grafana.md b/docs/01-getting-started/03-examples/02-online-serving/25-prometheus_grafana.md index f6d19b5..932b998 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/25-prometheus_grafana.md +++ b/docs/01-getting-started/03-examples/02-online-serving/25-prometheus_grafana.md @@ -2,7 +2,7 @@ title: Prometheus 与 Grafana 监控方案 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/prometheus_grafana](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/prometheus_grafana) diff --git a/docs/01-getting-started/03-examples/02-online-serving/26-run_cluster.md b/docs/01-getting-started/03-examples/02-online-serving/26-run_cluster.md index d7be3d5..d485a7d 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/26-run_cluster.md +++ b/docs/01-getting-started/03-examples/02-online-serving/26-run_cluster.md @@ -2,7 +2,7 @@ title: Run Cluster --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/run_cluster.sh](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/run_cluster.sh) diff --git a/docs/01-getting-started/03-examples/02-online-serving/27-sagemaker-entrypoint.md b/docs/01-getting-started/03-examples/02-online-serving/27-sagemaker-entrypoint.md index 62c3df3..8050e82 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/27-sagemaker-entrypoint.md +++ b/docs/01-getting-started/03-examples/02-online-serving/27-sagemaker-entrypoint.md @@ -2,7 +2,7 @@ title: Sagemaker-entrypoint --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/online_serving/sagemaker-entrypoint.sh](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/sagemaker-entrypoint.sh) diff --git a/docs/01-getting-started/03-examples/02-online-serving/README.md b/docs/01-getting-started/03-examples/02-online-serving/README.md index e48c349..9b30a30 100644 --- a/docs/01-getting-started/03-examples/02-online-serving/README.md +++ b/docs/01-getting-started/03-examples/02-online-serving/README.md @@ -2,35 +2,35 @@ title: 在线服务 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 在线示例展示了如何在在线环境中使用 vLLM,在线环境要求实时预测。 示例 -- [API Client](https://docs.vllm.ai/en/latest/getting_started/examples/api_client.html) -- [Helm Charts](https://docs.vllm.ai/en/latest/getting_started/examples/chart-helm.html) -- [Cohere Rerank Client](https://docs.vllm.ai/en/latest/getting_started/examples/cohere_rerank_client.html) -- [Disaggregated Prefill](https://docs.vllm.ai/en/latest/getting_started/examples/disaggregated_prefill.html) -- [Gradio OpenAI Chatbot Webserver](https://docs.vllm.ai/en/latest/getting_started/examples/gradio_openai_chatbot_webserver.html) -- [Gradio Webserver](https://docs.vllm.ai/en/latest/getting_started/examples/gradio_webserver.html) -- [Jinaai Rerank Client](https://docs.vllm.ai/en/latest/getting_started/examples/jinaai_rerank_client.html) -- [Multi-Node-Serving](https://docs.vllm.ai/en/latest/getting_started/examples/multi-node-serving.html) -- [OpenAI Chat Completion Client](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_completion_client.html) -- [OpenAI Chat Completion Client For Multimodal](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_completion_client_for_multimodal.html) -- [OpenAI Chat Completion Client With Tools](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_completion_client_with_tools.html) -- [OpenAI Chat Completion Structured Outputs](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_completion_structured_outputs.html) -- [OpenAI Chat Completion Structured Outputs With Reasoning](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_completion_structured_outputs_with_reasoning.html) -- [OpenAI Chat Completion Tool Calls With Reasoning](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_completion_tool_calls_with_reasoning.html) -- [OpenAI Chat Completion With Reasoning](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_completion_with_reasoning.html) -- [OpenAI Chat Completion With Reasoning Streaming](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_completion_with_reasoning_streaming.html) -- [OpenAI Chat Embedding Client For Multimodal](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_embedding_client_for_multimodal.html) -- [OpenAI Completion Client](https://docs.vllm.ai/en/latest/getting_started/examples/openai_completion_client.html) -- [OpenAI Cross Encoder Score](https://docs.vllm.ai/en/latest/getting_started/examples/openai_cross_encoder_score.html) -- [OpenAI Embedding Client](https://docs.vllm.ai/en/latest/getting_started/examples/openai_embedding_client.html) -- [OpenAI Pooling Client](https://docs.vllm.ai/en/latest/getting_started/examples/openai_pooling_client.html) -- [OpenAI Transcription Client](https://docs.vllm.ai/en/latest/getting_started/examples/openai_transcription_client.html) -- [Setup OpenTelemetry POC](https://docs.vllm.ai/en/latest/getting_started/examples/opentelemetry.html) -- [Prometheus and Grafana](https://docs.vllm.ai/en/latest/getting_started/examples/prometheus_grafana.html) -- [Run Cluster](https://docs.vllm.ai/en/latest/getting_started/examples/run_cluster.html) -- [Sagemaker-Entrypoint](https://docs.vllm.ai/en/latest/getting_started/examples/sagemaker-entrypoint.html) +- [API Client](/docs/getting-started/examples/online-serving/api_client) +- [Helm Charts](/docs/getting-started/examples/online-serving/chart-helm) +- [Cohere Rerank Client](/docs/getting-started/examples/online-serving/cohere_rerank_client) +- [Disaggregated Prefill](/docs/getting-started/examples/online-serving/disaggregated_prefill) +- [Gradio OpenAI Chatbot Webserver](/docs/getting-started/examples/online-serving/gradio_openai_chatbot_webserver) +- [Gradio Webserver](/docs/getting-started/examples/online-serving/gradio_webserver) +- [Jinaai Rerank Client](/docs/getting-started/examples/online-serving/jinaai_rerank_client) +- [Multi-Node-Serving](/docs/getting-started/examples/online-serving/multi-node-serving) +- [OpenAI Chat Completion Client](/docs/getting-started/examples/online-serving/openai_chat_completion_client) +- [OpenAI Chat Completion Client For Multimodal](/docs/getting-started/examples/online-serving/openai_chat_completion_client_for_multimodal) +- [OpenAI Chat Completion Client With Tools](/docs/getting-started/examples/online-serving/openai_chat_completion_client_with_tools) +- [OpenAI Chat Completion Structured Outputs](/docs/getting-started/examples/online-serving/openai_chat_completion_structured_outputs) +- [OpenAI Chat Completion Structured Outputs With Reasoning](/docs/getting-started/examples/online-serving/openai_chat_completion_structured_outputs_with_reasoning) +- [OpenAI Chat Completion Tool Calls With Reasoning](/docs/getting-started/examples/online-serving/openai_chat_completion_tool_calls_with_reasoning) +- [OpenAI Chat Completion With Reasoning](/docs/getting-started/examples/online-serving/openai_chat_completion_with_reasoning) +- [OpenAI Chat Completion With Reasoning Streaming](/docs/getting-started/examples/online-serving/openai_chat_completion_with_reasoning_streaming) +- [OpenAI Chat Embedding Client For Multimodal](/docs/getting-started/examples/online-serving/openai_chat_embedding_client_for_multimodal) +- [OpenAI Completion Client](/docs/getting-started/examples/online-serving/openai_completion_client) +- [OpenAI Cross Encoder Score](/docs/getting-started/examples/online-serving/openai_cross_encoder_score) +- [OpenAI Embedding Client](/docs/getting-started/examples/online-serving/openai_embedding_client) +- [OpenAI Pooling Client](/docs/getting-started/examples/online-serving/openai_pooling_client) +- [OpenAI Transcription Client](/docs/getting-started/examples/online-serving/openai_transcription_client) +- [Setup OpenTelemetry POC](/docs/getting-started/examples/online-serving/opentelemetry) +- [Prometheus and Grafana](/docs/getting-started/examples/online-serving/prometheus_grafana) +- [Run Cluster](/docs/getting-started/examples/online-serving/run_cluster) +- [Sagemaker-Entrypoint](/docs/getting-started/examples/online-serving/sagemaker-entrypoint) diff --git a/docs/01-getting-started/03-examples/03-other/01-logging_configuration.md b/docs/01-getting-started/03-examples/03-other/01-logging_configuration.md index 73b84ee..d29ea15 100644 --- a/docs/01-getting-started/03-examples/03-other/01-logging_configuration.md +++ b/docs/01-getting-started/03-examples/03-other/01-logging_configuration.md @@ -2,7 +2,7 @@ title: 日志配置说明 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/other/logging_configuration.md](https://github.com/vllm-project/vllm/blob/main/examples/other/logging_configuration.md). diff --git a/docs/01-getting-started/03-examples/03-other/02-tensorize_vllm_model.md b/docs/01-getting-started/03-examples/03-other/02-tensorize_vllm_model.md index 6c02e69..16b30af 100644 --- a/docs/01-getting-started/03-examples/03-other/02-tensorize_vllm_model.md +++ b/docs/01-getting-started/03-examples/03-other/02-tensorize_vllm_model.md @@ -2,7 +2,7 @@ title: Tensorize Vllm Model --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 源码 [examples/other/tensorize_vllm_model.py](https://github.com/vllm-project/vllm/blob/main/examples/other/tensorize_vllm_model.py) diff --git a/docs/01-getting-started/03-examples/03-other/README.md b/docs/01-getting-started/03-examples/03-other/README.md index c321682..080b11d 100644 --- a/docs/01-getting-started/03-examples/03-other/README.md +++ b/docs/01-getting-started/03-examples/03-other/README.md @@ -2,11 +2,11 @@ title: 其他 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 其他示例展示了不适合离线和在线分类的示例。 ## 示例 -- [Logging Configuration](https://docs.vllm.ai/en/latest/getting_started/examples/logging_configuration.html) -- [Tensorize vLLM Model](https://docs.vllm.ai/en/latest/getting_started/examples/tensorize_vllm_model.html) +- [Logging Configuration](/docs/getting-started/examples/other/logging_configuration) +- [Tensorize vLLM Model](/docs/getting-started/examples/other/tensorize_vllm_model) diff --git a/docs/01-getting-started/04-troubleshooting.md b/docs/01-getting-started/04-troubleshooting.md index 9f1f46c..762a846 100644 --- a/docs/01-getting-started/04-troubleshooting.md +++ b/docs/01-getting-started/04-troubleshooting.md @@ -2,9 +2,7 @@ title: 故障排除 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) - -# +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本文档概述了一些可考虑的故障排除策略。如果您认为发现了一个错误,请先[搜索现有问题](https://github.com/vllm-project/vllm/issues?q=is%253Aissue)查看是否已报告。如果没有,请[提交新问题](https://github.com/vllm-project/vllm/issues/new/choose),提供尽可能多的相关信息。 diff --git a/docs/01-getting-started/05-faq.md b/docs/01-getting-started/05-faq.md index deeb4f5..7b3bd0d 100644 --- a/docs/01-getting-started/05-faq.md +++ b/docs/01-getting-started/05-faq.md @@ -2,7 +2,7 @@ title: Frequently Asked Questions --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) # 常见问题 diff --git a/docs/01-getting-started/06-v1-user-guide.md b/docs/01-getting-started/06-v1-user-guide.md index 3ac2264..cb3cc7c 100644 --- a/docs/01-getting-started/06-v1-user-guide.md +++ b/docs/01-getting-started/06-v1-user-guide.md @@ -2,7 +2,7 @@ title: vLLM V1 用户指南 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) V1 现已默认启用所有支持的使用场景,我们将逐步为计划支持的每个场景启用该版本。请在 [GitHub](https://github.com/vllm-project/vllm) 或 [vLLM Slack](https://inviter.co/vllm-slack) 分享反馈。 diff --git a/docs/02-models/01-supported_models.md b/docs/02-models/01-supported_models.md index 10c3ea4..c4d1b84 100644 --- a/docs/02-models/01-supported_models.md +++ b/docs/02-models/01-supported_models.md @@ -2,7 +2,7 @@ title: 支持模型列表 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 支持跨多种任务的生成式和池化模型。若模型支持多个任务,可通过 `--task` 参数指定任务。 diff --git a/docs/02-models/02-generative_models.md b/docs/02-models/02-generative_models.md index 4136a99..75ab393 100644 --- a/docs/02-models/02-generative_models.md +++ b/docs/02-models/02-generative_models.md @@ -2,7 +2,7 @@ title: 生成模型 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM能够很好地支持生成模型,它兼容并能够有效运行大多数的大型语言模型 (LLMs)。 diff --git a/docs/02-models/03-Pooling Models.md b/docs/02-models/03-Pooling Models.md index 8acd99b..e968a2f 100644 --- a/docs/02-models/03-Pooling Models.md +++ b/docs/02-models/03-Pooling Models.md @@ -2,7 +2,7 @@ title: 池化模型 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 还支持池化模型,包括嵌入模型、重排序模型和奖励模型。 diff --git a/docs/02-models/04-extensions/01-runai_model_streamer.md b/docs/02-models/04-extensions/01-runai_model_streamer.md index fccd72e..1e24122 100644 --- a/docs/02-models/04-extensions/01-runai_model_streamer.md +++ b/docs/02-models/04-extensions/01-runai_model_streamer.md @@ -2,7 +2,7 @@ title: 使用 Run:ai Model Streamer 加载模型 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) Run:ai Model Streamer 是一个用于并发读取张量并将其流式传输到 GPU 内存的库。更多信息可以在 [Run:ai Model Streamer 文档](https://github.com/run-ai/runai-model-streamer/blob/master/docs/README.md)中找到。 diff --git a/docs/02-models/04-extensions/02-tensorizer.md b/docs/02-models/04-extensions/02-tensorizer.md index 1738c17..4828a51 100644 --- a/docs/02-models/04-extensions/02-tensorizer.md +++ b/docs/02-models/04-extensions/02-tensorizer.md @@ -2,7 +2,7 @@ title: 使用 CoreWeave 的 Tensorizer 加载模型 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 支持使用 [CoreWeave 的 Tensorizer](https://docs.coreweave.com/coreweave-machine-learning-and-ai/inference/tensorizer) 加载模型。vLLM 模型张量可以被序列化到磁盘、HTTP/HTTPS 端点或 S3 端点,并在运行时极快地直接反序列化到 GPU,从而显著缩短 Pod 启动时间并减少 CPU 内存使用。同时,Tensorizer 还支持张量加密。 diff --git a/docs/02-models/04-extensions/03-fastsafetensor.md b/docs/02-models/04-extensions/03-fastsafetensor.md index bd29474..c2671f2 100644 --- a/docs/02-models/04-extensions/03-fastsafetensor.md +++ b/docs/02-models/04-extensions/03-fastsafetensor.md @@ -2,6 +2,6 @@ title: 使用 fastsafetensors 加载模型权重 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 使用 fastsafetensor 库可以通过利用 GPU 直接存储将模型权重加载到 GPU 内存。有关详细信息,请参阅 https://github.com/foundation-model-stack/fastsafetensors。要启用此功能,请将环境变量 USE_FASTSAFETENSOR 设置为 true。 diff --git a/docs/02-models/04-extensions/README.md b/docs/02-models/04-extensions/README.md index 7ebd0a0..9d9af76 100644 --- a/docs/02-models/04-extensions/README.md +++ b/docs/02-models/04-extensions/README.md @@ -2,10 +2,10 @@ title: 内置扩展 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) -使用 Run:ai Model Streamer 加载模型 +[使用 Run:ai Model Streamer 加载模型](/docs/models/extensions/runai_model_streamer) -使用 CoreWeave 的 Tensorizer 加载模型 +[使用 CoreWeave 的 Tensorizer 加载模型](/docs/models/extensions/tensorizer) -使用 fastsafetensors 加载模型权重 +[使用 fastsafetensors 加载模型权重](docs/models/extensions/fastsafetensor) diff --git a/docs/03-features/01-quantization/03-bnb.md b/docs/03-features/01-quantization/03-bnb.md index d827cbf..1a2c077 100644 --- a/docs/03-features/01-quantization/03-bnb.md +++ b/docs/03-features/01-quantization/03-bnb.md @@ -2,7 +2,7 @@ title: BitsAndBytes --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 现在支持 [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes) 以实现更高效的模型推理。 BitsAndBytes 量化模型可以减少内存使用并增强性能,同时不会显著牺牲准确性。与其他量化方法相比,BitsAndBytes 无需使用输入数据来校准量化模型。 diff --git a/docs/03-features/01-quantization/04-gguf.md b/docs/03-features/01-quantization/04-gguf.md index 5380a1f..37a32f5 100644 --- a/docs/03-features/01-quantization/04-gguf.md +++ b/docs/03-features/01-quantization/04-gguf.md @@ -2,7 +2,7 @@ title: GGUF --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) > **警告** > 请注意,vLLM 中的 GGUF 支持目前处于高度实验性阶段,且未进行充分优化,可能与其他功能不兼容。目前,您可以使用 GGUF 来减少内存占用。如果您遇到任何问题,请向 vLLM 团队报告。 diff --git a/docs/03-features/01-quantization/05-gptqmodel.md b/docs/03-features/01-quantization/05-gptqmodel.md index 4f2a5c3..22d975b 100644 --- a/docs/03-features/01-quantization/05-gptqmodel.md +++ b/docs/03-features/01-quantization/05-gptqmodel.md @@ -2,7 +2,7 @@ title: GPTQModel --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 要创建新的 4 位或 8 位 GPTQ 量化模型,您可以利用 ModelCloud.AI 的 [GPTQModel](https://github.com/ModelCloud/GPTQModel)。 diff --git a/docs/03-features/01-quantization/06-int4.md b/docs/03-features/01-quantization/06-int4.md index 8340e35..d186114 100644 --- a/docs/03-features/01-quantization/06-int4.md +++ b/docs/03-features/01-quantization/06-int4.md @@ -2,7 +2,7 @@ title: INT4 W4A16 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 支持将权重量化为 INT4,以节省内存并加速推理。这种量化方法可以有效地在减小模型大小的同时,在每秒低查询的工作量中保持低延迟 (QPS)。 diff --git a/docs/03-features/01-quantization/07-int8.md b/docs/03-features/01-quantization/07-int8.md index e2fc12e..33fbc5b 100644 --- a/docs/03-features/01-quantization/07-int8.md +++ b/docs/03-features/01-quantization/07-int8.md @@ -2,7 +2,7 @@ title: INT8 W8A8 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) # INT8 W8A8 diff --git a/docs/03-features/01-quantization/08-fp8.md b/docs/03-features/01-quantization/08-fp8.md index 01fc9e2..af937af 100644 --- a/docs/03-features/01-quantization/08-fp8.md +++ b/docs/03-features/01-quantization/08-fp8.md @@ -2,7 +2,7 @@ title: FP8 W8A8 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 使用 Nvidia H100 和 AMD MI300x 等 GPU 上的硬件加速时,支持 FP8(8 位浮点)权重和激活量化。目前,W8A8 仅正式支持 Hopper 和 Ada Lovelace GPU。使用 Marlin 内核的 W8A16(仅权重 FP8)支持 Ampere GPU。使用 FP8 进行模型量化可将模型内存需求减少 2 倍,并将吞吐量提高 1.6 倍,同时对准确性的影响最小。 diff --git a/docs/03-features/01-quantization/09-quantized_kvcache.md b/docs/03-features/01-quantization/09-quantized_kvcache.md index 2ad30e9..7816d59 100644 --- a/docs/03-features/01-quantization/09-quantized_kvcache.md +++ b/docs/03-features/01-quantization/09-quantized_kvcache.md @@ -2,7 +2,7 @@ title: FP8 KV 缓存 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 将 KV 缓存量化为 FP8 可减少其内存占用。这增加了存储在缓存中的 token 数量,从而提高了吞吐量。 diff --git a/docs/03-features/01-quantization/10-TorchAO.md b/docs/03-features/01-quantization/10-TorchAO.md index 04e58a5..837e3f1 100644 --- a/docs/03-features/01-quantization/10-TorchAO.md +++ b/docs/03-features/01-quantization/10-TorchAO.md @@ -2,7 +2,7 @@ title: TorchAO --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) TorchAO 是 PyTorch 的架构优化库,它为推理和训练提供高性能的 dtype、优化技术和内核,具有与原生 PyTorch 功能(如 torch.compile、FSDP 等)的可组合性。可以[在此处](https://github.com/pytorch/ao/tree/main/torchao/quantization#benchmarks)找到一些基准数字。 diff --git a/docs/03-features/01-quantization/README.md b/docs/03-features/01-quantization/README.md index 6436b41..2d142ab 100644 --- a/docs/03-features/01-quantization/README.md +++ b/docs/03-features/01-quantization/README.md @@ -2,17 +2,19 @@ title: 量化 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 量化通过牺牲模型精度来换取更小的内存占用,从而使得大型模型能够在更广泛的设备上运行。 ## 目录 -- 支持硬件 -- AutoAWQ -- BitsAndBytes -- GGUF -- INT4 W4A16 -- INT8 W8A8 -- FP8 W8A8 -- 量化 KV 缓存 +- [支持硬件](/docs/features/quantization/supported_hardware) +- [AutoAWQ](/docs/features/quantization/auto_awq) +- [BitsAndBytes](/docs/features/quantization/bnb) +- [GGUF](/docs/features/quantization/gguf) +- [GPTQModel](/docs/features/quantization/gptqmodel) +- [INT4 W4A16](/docs/features/quantization/int4) +- [INT8 W8A8](/docs/features/quantization/int8) +- [FP8 W8A8](/docs/features/quantization/fp8) +- [量化 KV 缓存](/docs/features/quantization/quantized_kvcache) +- [TorchAO](/docs/features/quantization/TorchAO) diff --git a/docs/03-features/02-lora.md b/docs/03-features/02-lora.md index d5f677e..d5e9abf 100644 --- a/docs/03-features/02-lora.md +++ b/docs/03-features/02-lora.md @@ -2,7 +2,7 @@ title: LoRA 适配器 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本文档向您展示如何在基本模型上将 [LoRA 适配器](https://arxiv.org/abs/2106.09685)与 vLLM 结合使用。 diff --git a/docs/03-features/03-tool_calling.md b/docs/03-features/03-tool_calling.md index f7d78f2..5de2f55 100644 --- a/docs/03-features/03-tool_calling.md +++ b/docs/03-features/03-tool_calling.md @@ -2,7 +2,7 @@ title: 工具调用 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 目前支持命名函数调用,并在聊天补全 API 的 `tool_choice` 字段中支持 `auto` 和 `none` 选项。`required` 选项**尚未支持**,但已在[开发计划](https://github.com/vllm-project/vllm/issues/13002#)中。 diff --git a/docs/03-features/04-reasoning_outputs.md b/docs/03-features/04-reasoning_outputs.md index 3f55c1b..a614547 100644 --- a/docs/03-features/04-reasoning_outputs.md +++ b/docs/03-features/04-reasoning_outputs.md @@ -2,7 +2,7 @@ title: 推理输出 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 支持推理模型,例如 [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1),这些模型旨在生成包含推理步骤和最终结论的输出。 diff --git a/docs/03-features/05-structured_outputs.md b/docs/03-features/05-structured_outputs.md index 0cffada..6cec75e 100644 --- a/docs/03-features/05-structured_outputs.md +++ b/docs/03-features/05-structured_outputs.md @@ -2,7 +2,7 @@ title: 结构化输出 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 支持使用 [outlines](https://github.com/dottxt-ai/outlines)、[lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer) 或 [xgrammar](https://github.com/mlc-ai/xgrammar) 作为引导解码的后端生成结构化输出。本文档展示了一些可用于生成结构化输出的不同选项的示例。 diff --git a/docs/03-features/06-automatic_prefix_caching.md b/docs/03-features/06-automatic_prefix_caching.md index 23154e8..0394557 100644 --- a/docs/03-features/06-automatic_prefix_caching.md +++ b/docs/03-features/06-automatic_prefix_caching.md @@ -2,7 +2,7 @@ title: 自动前缀缓存 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) # 介绍 diff --git a/docs/03-features/07-disagg_prefill.md b/docs/03-features/07-disagg_prefill.md index edc6668..9c54916 100644 --- a/docs/03-features/07-disagg_prefill.md +++ b/docs/03-features/07-disagg_prefill.md @@ -2,7 +2,7 @@ title: 分离式预填充(实验性功能) --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本文介绍 vLLM 中的分离式预填充功能。 diff --git a/docs/03-features/08-spec_decode.md b/docs/03-features/08-spec_decode.md index 804a307..8537330 100644 --- a/docs/03-features/08-spec_decode.md +++ b/docs/03-features/08-spec_decode.md @@ -2,7 +2,7 @@ title: 分离式预填充(实验性功能) --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) > **警告** > diff --git a/docs/03-features/09-compatibility_matrix.md b/docs/03-features/09-compatibility_matrix.md index 3a0c07f..e494e85 100644 --- a/docs/03-features/09-compatibility_matrix.md +++ b/docs/03-features/09-compatibility_matrix.md @@ -2,7 +2,7 @@ title: 兼容矩阵 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 下表展示了互斥特性和对某些硬件的支持 diff --git a/docs/04-training/01-trl.md b/docs/04-training/01-trl.md index e031456..7496b81 100644 --- a/docs/04-training/01-trl.md +++ b/docs/04-training/01-trl.md @@ -2,7 +2,7 @@ title: Transformers 强化学习 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) Transformers 强化学习 (TRL) 是一个全栈库,提供了一套工具,用于通过监督微调 (SFT)、组相对策略优化 (GRPO)、直接偏好优化 (DPO)、奖励建模等方法训练 Transformer 语言模型。该库与 🤗 Transformers 集成。 diff --git a/docs/04-training/02-rlhf.md b/docs/04-training/02-rlhf.md index c717f71..6c50a7c 100644 --- a/docs/04-training/02-rlhf.md +++ b/docs/04-training/02-rlhf.md @@ -2,7 +2,7 @@ title: RLHF 基于人类反馈的强化学习 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 基于人类反馈的强化学习 (Reinforcement Learning from Human Feedback, RLHF) 是一种利用人类生成的偏好数据微调语言模型的技术,以使模型输出与期望行为保持一致。 diff --git a/docs/05-inference-and-serving/01-offline_inference.md b/docs/05-inference-and-serving/01-offline_inference.md index 2483c64..2efa4c3 100644 --- a/docs/05-inference-and-serving/01-offline_inference.md +++ b/docs/05-inference-and-serving/01-offline_inference.md @@ -2,7 +2,7 @@ title: 离线推理 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 您可以在自己的代码中运行 vLLM 来处理一组提示。 diff --git a/docs/05-inference-and-serving/02-openai_compatible_server.md b/docs/05-inference-and-serving/02-openai_compatible_server.md index 98ab844..8d92b01 100644 --- a/docs/05-inference-and-serving/02-openai_compatible_server.md +++ b/docs/05-inference-and-serving/02-openai_compatible_server.md @@ -2,7 +2,7 @@ title: OpenAI 兼容服务器 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 提供实现了 OpenAI  [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat) 等接口的 HTTP 服务器。 diff --git a/docs/05-inference-and-serving/03-multimodal_inputs.md b/docs/05-inference-and-serving/03-multimodal_inputs.md index 3b90df5..a99ea47 100644 --- a/docs/05-inference-and-serving/03-multimodal_inputs.md +++ b/docs/05-inference-and-serving/03-multimodal_inputs.md @@ -2,7 +2,7 @@ title: 多模态输入 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本页教你如何在 vLLM 中向[多模态模型](https://docs.vllm.ai/en/latest/models/supported_models.html#supported-mm-models)传递多模态输入。 diff --git a/docs/05-inference-and-serving/04-distributed_serving_new.md b/docs/05-inference-and-serving/04-distributed_serving_new.md index a306a1e..f7ea0cf 100644 --- a/docs/05-inference-and-serving/04-distributed_serving_new.md +++ b/docs/05-inference-and-serving/04-distributed_serving_new.md @@ -2,7 +2,7 @@ title: 分布式推理与服务 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 如何决定分布式推理策略? diff --git a/docs/05-inference-and-serving/05-metrics.md b/docs/05-inference-and-serving/05-metrics.md index 713bba5..932e9d9 100644 --- a/docs/05-inference-and-serving/05-metrics.md +++ b/docs/05-inference-and-serving/05-metrics.md @@ -2,7 +2,7 @@ title: 生产指标 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 公布了许多可用于监控系统运行状况的指标。这些指标通过 vLLM OpenAI 兼容 API 服务器上的 `/metrics` 端点公开。 diff --git a/docs/05-inference-and-serving/06-engine_args.md b/docs/05-inference-and-serving/06-engine_args.md index 1ea3c0f..469a662 100644 --- a/docs/05-inference-and-serving/06-engine_args.md +++ b/docs/05-inference-and-serving/06-engine_args.md @@ -2,7 +2,7 @@ title: 引擎参数 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 可以在下述找到对每一个 vLLM 引擎参数的说明: diff --git a/docs/05-inference-and-serving/07-env_vars.md b/docs/05-inference-and-serving/07-env_vars.md index 4ef5ba8..4eb10de 100644 --- a/docs/05-inference-and-serving/07-env_vars.md +++ b/docs/05-inference-and-serving/07-env_vars.md @@ -2,7 +2,7 @@ title: 环境 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 使用以下环境变量来配置系统: diff --git a/docs/05-inference-and-serving/08-usage_stats.md b/docs/05-inference-and-serving/08-usage_stats.md index e168e7e..df77bf2 100644 --- a/docs/05-inference-and-serving/08-usage_stats.md +++ b/docs/05-inference-and-serving/08-usage_stats.md @@ -2,7 +2,7 @@ title: 使用统计数据收集 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 默认情况下,vLLM 会收集匿名使用数据,以帮助工程团队更好地了解哪些硬件和模型配置被广泛使用。这些数据使他们能够优先考虑对最常见的工作负载的努力。收集的数据是透明的,不包含任何敏感信息,并将公开发布,以便社区受益。 diff --git a/docs/05-inference-and-serving/09-integrations/01-langchain.md b/docs/05-inference-and-serving/09-integrations/01-langchain.md index c4ac2fb..2e391ee 100644 --- a/docs/05-inference-and-serving/09-integrations/01-langchain.md +++ b/docs/05-inference-and-serving/09-integrations/01-langchain.md @@ -2,7 +2,7 @@ title: LangChain --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 也可通过 [LangChain](https://github.com/langchain-ai/langchain) 获取。 diff --git a/docs/05-inference-and-serving/09-integrations/02-llamaindex.md b/docs/05-inference-and-serving/09-integrations/02-llamaindex.md index cce454c..0678a38 100644 --- a/docs/05-inference-and-serving/09-integrations/02-llamaindex.md +++ b/docs/05-inference-and-serving/09-integrations/02-llamaindex.md @@ -2,7 +2,7 @@ title: LlamaIndex --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 也可通过 [LlamaIndex](https://github.com/run-llama/llama_index) 获取。 diff --git a/docs/05-inference-and-serving/09-integrations/README.md b/docs/05-inference-and-serving/09-integrations/README.md index e4f7a0d..604156f 100644 --- a/docs/05-inference-and-serving/09-integrations/README.md +++ b/docs/05-inference-and-serving/09-integrations/README.md @@ -4,5 +4,5 @@ title: 外部集成 [\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) -- [LangChain](https://docs.vllm.ai/en/latest/serving/integrations/langchain.html) -- [LlamaIndex](https://docs.vllm.ai/en/latest/serving/integrations/llamaindex.html) +- [LangChain](/docs/inference-and-serving/integrations/langchain) +- [LlamaIndex](/docs/inference-and-serving/integrations/llamaindex) diff --git a/docs/06-deployment/01-docker.md b/docs/06-deployment/01-docker.md index 729660a..f3be1a9 100644 --- a/docs/06-deployment/01-docker.md +++ b/docs/06-deployment/01-docker.md @@ -2,7 +2,7 @@ title: 使用 Docker 进行部署 --- -[*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 使用 vLLM 官方 Docker 镜像 diff --git a/docs/06-deployment/02-k8s.md b/docs/06-deployment/02-k8s.md index f2ff8b8..bd4ee34 100644 --- a/docs/06-deployment/02-k8s.md +++ b/docs/06-deployment/02-k8s.md @@ -2,7 +2,7 @@ title: 使用 Kubernetes --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 在 Kubernetes 上部署 vLLM 是一种可扩展且高效的方式来提供机器学习模型服务。本指南将引导您使用原生 Kubernetes 部署 vLLM。 diff --git a/docs/06-deployment/03-nginx.md b/docs/06-deployment/03-nginx.md index a2618e8..2e407ea 100644 --- a/docs/06-deployment/03-nginx.md +++ b/docs/06-deployment/03-nginx.md @@ -2,25 +2,25 @@ title: 使用 Nginx --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本文档介绍如何启动多个 vLLM 服务器容器,并使用 Nginx 作为负载均衡器在这些服务器之间进行流量分配。 目录 -1. [构建 Nginx 容器](https://docs.vllm.ai/en/latest/deployment/nginx.html#nginxloadbalancer-nginx-build) +1. [构建 Nginx 容器](/docs/deployment/nginx#构建-nginx-容器) -2. [创建简单的 Nginx 配置文件](https://docs.vllm.ai/en/latest/deployment/nginx.html#nginxloadbalancer-nginx-conf) +2. [创建简单的 Nginx 配置文件](/docs/deployment/nginx#创建简单的-nginx-配置文件) -3. [构建 vLLM 容器](https://docs.vllm.ai/en/latest/deployment/nginx.html#nginxloadbalancer-nginx-vllm-container) +3. [构建 vLLM 容器](/docs/deployment/nginx#构建-vllm-容器) -4. [创建 Docker 网络](https://docs.vllm.ai/en/latest/deployment/nginx.html#nginxloadbalancer-nginx-docker-network) +4. [创建 Docker 网络](/docs/deployment/nginx#创建-docker-网络) -5. [启动 vLLM 容器](https://docs.vllm.ai/en/latest/deployment/nginx.html#nginxloadbalancer-nginx-launch-container) +5. [启动 vLLM 容器](/docs/deployment/nginx#启动-vllm-容器) -6. [启动 Nginx](https://docs.vllm.ai/en/latest/deployment/nginx.html#nginxloadbalancer-nginx-launch-nginx) +6. [启动 Nginx](/docs/deployment/nginx#启动-nginx) -7. [验证 vLLM 服务器是否准备就绪](https://docs.vllm.ai/en/latest/deployment/nginx.html#nginxloadbalancer-nginx-verify-nginx) +7. [验证 vLLM 服务器是否准备就绪](/docs/deployment/nginx#验证-vllm-服务器是否准备就绪) ## 构建 Nginx 容器 diff --git a/docs/06-deployment/04-framworks/01-bentoml.md b/docs/06-deployment/04-framworks/01-bentoml.md index a6a46bb..39d2872 100644 --- a/docs/06-deployment/04-framworks/01-bentoml.md +++ b/docs/06-deployment/04-framworks/01-bentoml.md @@ -2,7 +2,7 @@ title: BentoML --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) # BentoML diff --git a/docs/06-deployment/04-framworks/02-cerebrium.md b/docs/06-deployment/04-framworks/02-cerebrium.md index 764c41b..a1c51b4 100644 --- a/docs/06-deployment/04-framworks/02-cerebrium.md +++ b/docs/06-deployment/04-framworks/02-cerebrium.md @@ -2,7 +2,7 @@ title: Cerebrium --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ![图片](/img/docs/v1-deployment/02-cerebrium_1.png) diff --git a/docs/06-deployment/04-framworks/03-dstack.md b/docs/06-deployment/04-framworks/03-dstack.md index f9c8c5f..f309dde 100644 --- a/docs/06-deployment/04-framworks/03-dstack.md +++ b/docs/06-deployment/04-framworks/03-dstack.md @@ -2,7 +2,7 @@ title: dstack --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ![图片](/img/docs/v1-deployment/03-dstack_1.png) diff --git a/docs/06-deployment/04-framworks/04-helm.md b/docs/06-deployment/04-framworks/04-helm.md index e1d950d..02f4ee4 100644 --- a/docs/06-deployment/04-framworks/04-helm.md +++ b/docs/06-deployment/04-framworks/04-helm.md @@ -2,7 +2,7 @@ title: Helm --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 用于在 Kubernetes 上部署 vLLM 的 Helm Chart。 diff --git a/docs/06-deployment/04-framworks/05-lws.md b/docs/06-deployment/04-framworks/05-lws.md index 3b81362..38a1e02 100644 --- a/docs/06-deployment/04-framworks/05-lws.md +++ b/docs/06-deployment/04-framworks/05-lws.md @@ -2,7 +2,7 @@ title: LWS --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) # LWS diff --git a/docs/06-deployment/04-framworks/06-modal.md b/docs/06-deployment/04-framworks/06-modal.md index ca9408c..5cf49de 100644 --- a/docs/06-deployment/04-framworks/06-modal.md +++ b/docs/06-deployment/04-framworks/06-modal.md @@ -2,7 +2,7 @@ title: Modal --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) # Modal diff --git a/docs/06-deployment/04-framworks/07-skypilot.md b/docs/06-deployment/04-framworks/07-skypilot.md index 13ea389..8a429c0 100644 --- a/docs/06-deployment/04-framworks/07-skypilot.md +++ b/docs/06-deployment/04-framworks/07-skypilot.md @@ -2,7 +2,7 @@ title: SkePilot --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ![图片](/img/docs/v1-deployment/07-skypilot_1.png) diff --git a/docs/06-deployment/04-framworks/08-triton.md b/docs/06-deployment/04-framworks/08-triton.md index 21b6520..69a8d92 100644 --- a/docs/06-deployment/04-framworks/08-triton.md +++ b/docs/06-deployment/04-framworks/08-triton.md @@ -2,7 +2,7 @@ title: NVIDIA Triton --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) # NVIDIA Triton diff --git a/docs/06-deployment/04-framworks/README.md b/docs/06-deployment/04-framworks/README.md index 67f5de2..144a66f 100644 --- a/docs/06-deployment/04-framworks/README.md +++ b/docs/06-deployment/04-framworks/README.md @@ -2,13 +2,13 @@ title: 使用其他框架 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) -- [BentoML](https://docs.vllm.ai/en/latest/deployment/frameworks/bentoml.html) -- [Cerebrium](https://docs.vllm.ai/en/latest/deployment/frameworks/cerebrium.html) -- [dstack](https://docs.vllm.ai/en/latest/deployment/frameworks/dstack.html) -- [Helm](https://docs.vllm.ai/en/latest/deployment/frameworks/helm.html) -- [LWS](https://docs.vllm.ai/en/latest/deployment/frameworks/lws.html) -- [Modal](https://docs.vllm.ai/en/latest/deployment/frameworks/modal.html) -- [SkyPilot](https://docs.vllm.ai/en/latest/deployment/frameworks/skypilot.html) -- [NVIDIA Triton](https://docs.vllm.ai/en/latest/deployment/frameworks/triton.html) +- [BentoML](/docs/deployment/frameworks/bentoml) +- [Cerebrium](/docs/deployment/frameworks/cerebrium) +- [dstack](/docs/deployment/frameworks/dstack) +- [Helm](/docs/deployment/frameworks/helm) +- [LWS](/docs/deployment/frameworks/lws) +- [Modal](/docs/deployment/frameworks/modal) +- [SkyPilot](/docs/deployment/frameworks/skypilot) +- [NVIDIA Triton](/docs/deployment/frameworks/triton) diff --git a/docs/06-deployment/05-integrations/01-kserve.md b/docs/06-deployment/05-integrations/01-kserve.md index 8c8010d..88f14b0 100644 --- a/docs/06-deployment/05-integrations/01-kserve.md +++ b/docs/06-deployment/05-integrations/01-kserve.md @@ -2,7 +2,7 @@ title: KServe --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 可以与 [KServe](https://github.com/kserve/kserve) 一起部署在 Kubernetes 上,实现高度可扩展的分布式模型服务。 diff --git a/docs/06-deployment/05-integrations/02-kubeai.md b/docs/06-deployment/05-integrations/02-kubeai.md index 42297a0..0c63096 100644 --- a/docs/06-deployment/05-integrations/02-kubeai.md +++ b/docs/06-deployment/05-integrations/02-kubeai.md @@ -2,7 +2,7 @@ title: KubeAI --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) [KubeAI](https://github.com/substratusai/kubeai) 是一个 Kubernetes 操作符,使您能够在 Kubernetes 上部署和管理 AI 模型。它提供了一种简单且可扩展的方式在生产环境中部署 vLLM。诸如从零扩展、基于负载的自动扩展、模型缓存等功能,开箱即用,无需外部依赖。 diff --git a/docs/06-deployment/05-integrations/03-llamastack.md b/docs/06-deployment/05-integrations/03-llamastack.md index 676dadf..251091b 100644 --- a/docs/06-deployment/05-integrations/03-llamastack.md +++ b/docs/06-deployment/05-integrations/03-llamastack.md @@ -2,7 +2,7 @@ title: Llama Stack --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 也可通过 [Llama Stack](https://github.com/meta-llama/llama-stack) 获取。 diff --git a/docs/06-deployment/05-integrations/04-llmaz.md b/docs/06-deployment/05-integrations/04-llmaz.md index f691ef4..8acf7fa 100644 --- a/docs/06-deployment/05-integrations/04-llmaz.md +++ b/docs/06-deployment/05-integrations/04-llmaz.md @@ -2,7 +2,7 @@ title: llamz --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) [llmaz](https://github.com/InftyAI/llmaz) 是一个易于使用且先进的 Kubernetes 大语言模型推理平台,专为生产环境设计。它使用 vLLM 作为默认的模型服务后端。 diff --git a/docs/06-deployment/05-integrations/05-production-stack.md b/docs/06-deployment/05-integrations/05-production-stack.md index 70def37..d83001b 100644 --- a/docs/06-deployment/05-integrations/05-production-stack.md +++ b/docs/06-deployment/05-integrations/05-production-stack.md @@ -2,7 +2,7 @@ title: 生产环境技术栈 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 在 Kubernetes 上部署 vLLM 是一种可扩展且高效的服务机器学习模型的方式。本指南将引导您使用 [vLLM 生产环境技术栈](https://github.com/vllm-project/production-stack) 部署 vLLM。该技术栈源于伯克利-芝加哥大学的合作,是 [vLLM 项目](https://github.com/vllm-project)下正式发布的生产优化代码库,专为 LLM 部署设计,具有以下特点: diff --git a/docs/06-deployment/05-integrations/README.md b/docs/06-deployment/05-integrations/README.md index 8ad5079..be8264e 100644 --- a/docs/06-deployment/05-integrations/README.md +++ b/docs/06-deployment/05-integrations/README.md @@ -2,10 +2,10 @@ title: 外部集成 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) -- KServe -- KubeAI -- Llama Stack -- llmaz -- 生产环境技术栈 (Production stack) +- [KServe](/docs/deployment/integrations/kserve) +- [KubeAI](/docs/deployment/integrations/kubeai) +- [Llama Stack](/docs/deployment/integrations/llamastack) +- [llmaz](/docs/deployment/integrations/llmaz) +- [生产环境技术栈 (Production stack)](/docs/deployment/integrations/production-stack) diff --git a/docs/07-performance/01-optimization.md b/docs/07-performance/01-optimization.md index 2c5f780..37539c5 100644 --- a/docs/07-performance/01-optimization.md +++ b/docs/07-performance/01-optimization.md @@ -2,7 +2,7 @@ title: 优化与调优 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 抢占 (Preemption) diff --git a/docs/07-performance/02-benchmarks.md b/docs/07-performance/02-benchmarks.md index e95fbd6..4738577 100644 --- a/docs/07-performance/02-benchmarks.md +++ b/docs/07-performance/02-benchmarks.md @@ -2,7 +2,7 @@ title: 基准套件 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 包含两组基准: diff --git a/docs/08-design/01-arch_overview.md b/docs/08-design/01-arch_overview.md index d6d0032..280566a 100644 --- a/docs/08-design/01-arch_overview.md +++ b/docs/08-design/01-arch_overview.md @@ -2,7 +2,7 @@ title: 架构概述 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本文档提供了 vLLM 架构的概述。 diff --git a/docs/08-design/02-huggingface_integration.md b/docs/08-design/02-huggingface_integration.md index f2a1094..543d65c 100644 --- a/docs/08-design/02-huggingface_integration.md +++ b/docs/08-design/02-huggingface_integration.md @@ -2,7 +2,7 @@ title: 与 HuggingFace 集成 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本文档描述了 vLLM 如何与 HuggingFace 库集成。我们将逐步解释在运行 `vllm serve` 时,后台会发生什么。 diff --git a/docs/08-design/03-plugin_system.md b/docs/08-design/03-plugin_system.md index 964a1b0..f4917fd 100644 --- a/docs/08-design/03-plugin_system.md +++ b/docs/08-design/03-plugin_system.md @@ -2,7 +2,7 @@ title: vLLM 的插件系统 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 社区经常请求能够通过自定义功能扩展 vLLM。为了便于实现这一点,vLLM 包含了一个插件系统,允许用户在不修改 vLLM 代码库的情况下添加自定义功能。本文档解释了 vLLM 中插件的工作原理以及如何为 vLLM 创建插件。 diff --git a/docs/08-design/04-paged_attention.md b/docs/08-design/04-paged_attention.md index d58a513..b3d50df 100644 --- a/docs/08-design/04-paged_attention.md +++ b/docs/08-design/04-paged_attention.md @@ -2,7 +2,7 @@ title: vLLM  分页注意力 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) - 目前,vLLM 使用自己的多头查询注意力内核实现(`csrc/attention/attention_kernels.cu`)。该内核旨在兼容 vLLM 的分页键值缓存,其中键和值缓存存储在不同的块中(注意,这里的块概念与 GPU 线程块不同。因此,在后续文档中,将把 vLLM 分页注意力块称为「块」,而把 GPU 线程块称为「线程块」)。 diff --git a/docs/08-design/05-mm_processing.md b/docs/08-design/05-mm_processing.md index e9e45bc..206da20 100644 --- a/docs/08-design/05-mm_processing.md +++ b/docs/08-design/05-mm_processing.md @@ -2,7 +2,7 @@ title: 多模态数据处理 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 为了实现 vLLM 中的各种优化,例如[分块预填充](https://docs.vllm.ai/en/latest/performance/optimization.html#chunked-prefill)和[前缀缓存](https://docs.vllm.ai/en/latest/features/automatic_prefix_caching.html#automatic-prefix-caching),我们使用 `BaseMultiModalProcessor` 来提供占位符特征 token(例如 ``)与多模态输入(例如原始输入图像)之间的对应关系,基于 HF 处理器的输出。 diff --git a/docs/08-design/06-automatic_prefix_caching.md b/docs/08-design/06-automatic_prefix_caching.md index 38e6e12..1d3068a 100644 --- a/docs/08-design/06-automatic_prefix_caching.md +++ b/docs/08-design/06-automatic_prefix_caching.md @@ -2,7 +2,7 @@ title: 自动前缀缓存 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) [PagedAttention](https://blog.vllm.ai/2023/06/20/vllm.html) 的核心思想是将每个请求的 KV 缓存划分为多个 KV 块。每个块包含固定数量的注意力键和值。PagedAttention 算法允许这些块存储在不连续的物理内存中,从而通过按需分配内存来消除内存碎片。 diff --git a/docs/08-design/07-multiprocessing.md b/docs/08-design/07-multiprocessing.md index f8d98a8..6f6f630 100644 --- a/docs/08-design/07-multiprocessing.md +++ b/docs/08-design/07-multiprocessing.md @@ -2,7 +2,7 @@ title: Python 多进程 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 调试 diff --git a/docs/09-design-v1/01-torch_compile.md b/docs/09-design-v1/01-torch_compile.md index 574b3e4..22d9cd8 100644 --- a/docs/09-design-v1/01-torch_compile.md +++ b/docs/09-design-v1/01-torch_compile.md @@ -2,7 +2,7 @@ title: vLLM 的 `torch.compile` 集成 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 在 vLLM 的 V1 架构中,`torch.compile` 默认启用且是框架的关键组成部分。本文档通过一个简单示例展示如何理解 `torch.compile` 的使用方式。 diff --git a/docs/09-design-v1/02-prefix_caching.md b/docs/09-design-v1/02-prefix_caching.md index ebd923e..28ffe6c 100644 --- a/docs/09-design-v1/02-prefix_caching.md +++ b/docs/09-design-v1/02-prefix_caching.md @@ -2,7 +2,7 @@ title: 自动前缀缓存 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 前缀缓存 KV 缓存块是 LLM 推理中一种流行的优化技术,用于避免冗余的提示计算。核心思想很简单——我们缓存已处理请求的 KV 缓存块,当新请求到来时如果前缀与之前请求相同就重用这些块。由于前缀缓存几乎是无成本的且不会改变模型输出,它已被许多公共端点(如 OpenAI、Anthropic 等)和大多数开源 LLM 推理框架(如 SGLang)广泛采用。 diff --git a/docs/09-design-v1/03-metrics.md b/docs/09-design-v1/03-metrics.md index c5140a4..16f116a 100644 --- a/docs/09-design-v1/03-metrics.md +++ b/docs/09-design-v1/03-metrics.md @@ -2,7 +2,7 @@ title: 指标 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 确保 v1 LLM 引擎公开的指标是 v0 可用指标的超集。 diff --git a/docs/10-contributing/01-overview.md b/docs/10-contributing/01-overview.md index 48492b3..f1d77c0 100644 --- a/docs/10-contributing/01-overview.md +++ b/docs/10-contributing/01-overview.md @@ -2,7 +2,7 @@ title: 为 vLLM 做贡献 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 感谢您有兴趣为 vLLM 做贡献!我们的社区向所有人开放,欢迎各种规模的贡献,无论大小。您可以通过以下方式为项目做出贡献: diff --git a/docs/10-contributing/02-profiling_index.md b/docs/10-contributing/02-profiling_index.md index 45099df..b0e8563 100644 --- a/docs/10-contributing/02-profiling_index.md +++ b/docs/10-contributing/02-profiling_index.md @@ -2,7 +2,7 @@ title: vLLM 性能分析 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) > **警告** > diff --git a/docs/10-contributing/03-dockerfile.md b/docs/10-contributing/03-dockerfile.md index 7edc92c..72edbcc 100644 --- a/docs/10-contributing/03-dockerfile.md +++ b/docs/10-contributing/03-dockerfile.md @@ -2,7 +2,7 @@ title: Dockerfile --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 请参阅[此处](https://github.com/vllm-project/vllm/blob/main/Dockerfile)了解主要 Dockerfile,构建用于使用 vLLM 运行 OpenAI 兼容服务器的镜像。有关使用 Docker 进行部署的更多信息可以在[此处](https://docs.vllm.ai/en/stable/serving/deploying_with_docker.html)找到。 diff --git a/docs/10-contributing/04-model/01-basic.md b/docs/10-contributing/04-model/01-basic.md index 1eef036..acf5685 100644 --- a/docs/10-contributing/04-model/01-basic.md +++ b/docs/10-contributing/04-model/01-basic.md @@ -2,7 +2,7 @@ title: 实现基础模型 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本指南将带您逐步实现一个基础的 vLLM 模型。 diff --git a/docs/10-contributing/04-model/02-registration.md b/docs/10-contributing/04-model/02-registration.md index c3c836f..0c259f5 100644 --- a/docs/10-contributing/04-model/02-registration.md +++ b/docs/10-contributing/04-model/02-registration.md @@ -2,7 +2,7 @@ title: 将模型注册到 vLLM --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 依赖模型注册表来确定如何运行每个模型。预注册的架构列表可以在[此处](https://docs.vllm.ai/en/latest/models/supported_models.html#supported-models)找到。 diff --git a/docs/10-contributing/04-model/03-tests.md b/docs/10-contributing/04-model/03-tests.md index 537abe8..7ef0b85 100644 --- a/docs/10-contributing/04-model/03-tests.md +++ b/docs/10-contributing/04-model/03-tests.md @@ -2,7 +2,7 @@ title: 编写单元测试 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本页解释了如何编写单元测试以验证您的模型实现。 diff --git a/docs/10-contributing/04-model/04-multimodal.md b/docs/10-contributing/04-model/04-multimodal.md index 5b6a1a8..72db1f8 100644 --- a/docs/10-contributing/04-model/04-multimodal.md +++ b/docs/10-contributing/04-model/04-multimodal.md @@ -2,7 +2,7 @@ title: 多模态支持 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本文档将引导您扩展基础模型,使其能够接受[多模态输入](https://docs.vllm.ai/en/latest/serving/multimodal_inputs.html#multimodal-inputs)。 diff --git a/docs/10-contributing/04-model/README.md b/docs/10-contributing/04-model/README.md index 5cad6de..6d3bfa3 100644 --- a/docs/10-contributing/04-model/README.md +++ b/docs/10-contributing/04-model/README.md @@ -2,16 +2,16 @@ title: 添加新模型 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 本节提供了更多关于如何将 [PyTorch](https://pytorch.org/) 模型集成到 vLLM 中的信息。 #### 目录 -- [实现基础模型](https://docs.vllm.ai/en/latest/contributing/model/basic.html) -- [将模型注册到 vLLM](https://docs.vllm.ai/en/latest/contributing/model/registration.html) -- [编写单元测试](https://docs.vllm.ai/en/latest/contributing/model/tests.html) -- [多模态支持](https://docs.vllm.ai/en/latest/contributing/model/multimodal.html) +- [实现基础模型](/docs/contributing/model/basic) +- [将模型注册到 vLLM](/docs/contributing/model/registration) +- [编写单元测试](/docs/contributing/model/tests) +- [多模态支持](/docs/contributing/model/multimodal) > **注意** > diff --git a/docs/11-api/01-offline_interence/01-llm.md b/docs/11-api/01-offline_interence/01-llm.md index 121e629..6044732 100644 --- a/docs/11-api/01-offline_interence/01-llm.md +++ b/docs/11-api/01-offline_interence/01-llm.md @@ -2,7 +2,7 @@ title: LLM Class --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) **class vllm.LLM(model:[str](https://docs.python.org/3/library/stdtypes.html#str), tokenizer:[str](https://docs.python.org/3/library/stdtypes.html#str)_|_[None](https://docs.python.org/3/library/constants.html#None) _= None, tokenizer_mode:[str](https://docs.python.org/3/library/stdtypes.html#str) = 'auto', skip_tokenizer_init:[bool](https://docs.python.org/3/library/functions.html#bool) = False, trust_remote_code: [bool](https://docs.python.org/3/library/functions.html#bool) = False, allowed_local_media_path:[str](https://docs.python.org/3/library/stdtypes.html#str) ='', tensor_parallel_size: [int](https://docs.python.org/3/library/functions.html#int) = 1, dtype: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'auto', quantization:[str](https://docs.python.org/3/library/stdtypes.html#str) \ [None](https://docs.python.org/3/library/constants.html#None) = None, revision: [str](https://docs.python.org/3/library/stdtypes.html#str) \ [None](https://docs.python.org/3/library/constants.html#None) = None, tokenizer_revision : [str](https://docs.python.org/3/library/stdtypes.html#str) \ [None](https://docs.python.org/3/library/constants.html#None) = None, seed:[int](https://docs.python.org/3/library/functions.html#int) \ [None](https://docs.python.org/3/library/constants.html#None) = None, gpu_memory_utilization: [float](https://docs.python.org/3/library/functions.html#float) = 0.9, swap_space: [float](https://docs.python.org/3/library/functions.html#float) = 4, cpu_offload_gb:[float](https://docs.python.org/3/library/functions.html#float) = 0, enforce_eager: [bool](https://docs.python.org/3/library/functions.html#bool) \ [None](https://docs.python.org/3/library/constants.html#None) = None, max_seq_len_to_capture: [int](https://docs.python.org/3/library/functions.html#int) = 8192, disable_custom_all_reduce : [bool](https://docs.python.org/3/library/functions.html#bool) = False,disable_async_output_proc: [bool](https://docs.python.org/3/library/functions.html#bool) = False, hf_overrides: [dict](https://docs.python.org/3/library/stdtypes.html#dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] | [Callable](https://docs.python.org/3/library/typing.html#typing.Callable)[[transformers.PretrainedConfig], transformers.PretrainedConfig] | None = None, mm_processor_kwargs: [dict](https://docs.python.org/3/library/stdtypes.html#dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] | [None](https://docs.python.org/3/library/constants.html#None) = None, task: [Literal](https://docs.python.org/3/library/typing.html#typing.Literal)['auto', 'generate', 'embedding', 'embed', 'classify', 'score', 'reward', 'transcription'] = 'auto', override_pooler_config: PoolerConfig | [None](https://docs.python.org/3/library/constants.html#None) = None, compilation_config: [int](https://docs.python.org/3/library/functions.html#int) | [dict](https://docs.python.org/3/library/stdtypes.html#dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] | [None](https://docs.python.org/3/library/constants.html#None) = None, ** kwargs)** diff --git a/docs/11-api/01-offline_interence/02-llm_inputs.md b/docs/11-api/01-offline_interence/02-llm_inputs.md index 2ce85d1..1341808 100644 --- a/docs/11-api/01-offline_interence/02-llm_inputs.md +++ b/docs/11-api/01-offline_interence/02-llm_inputs.md @@ -2,7 +2,7 @@ title: LLM Inputs --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) **vllm.inputs.PromptType** diff --git a/docs/11-api/01-offline_interence/README.md b/docs/11-api/01-offline_interence/README.md index bdc3d18..60ad438 100644 --- a/docs/11-api/01-offline_interence/README.md +++ b/docs/11-api/01-offline_interence/README.md @@ -2,9 +2,9 @@ title: 离线推理 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 目录 -- [LLM Class](https://docs.vllm.ai/en/latest/api/offline_inference/llm.html) -- [LLM Inputs](https://docs.vllm.ai/en/latest/api/offline_inference/llm_inputs.html) +- [LLM Class](/docs/api/offline_inference/llm) +- [LLM Inputs](/docs/api/offline_inference/llm_inputs) diff --git a/docs/11-api/02-engine/01-llm_engine.md b/docs/11-api/02-engine/01-llm_engine.md index 2dd6c7f..5540b4a 100644 --- a/docs/11-api/02-engine/01-llm_engine.md +++ b/docs/11-api/02-engine/01-llm_engine.md @@ -2,7 +2,7 @@ title: LLMEngine --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) **class_ vllm.LLMEngine(_vllm\_config: VllmConfig_, _executor\_class: [Type](https://docs.python.org/3/library/typing.html#typing.Type "(in Python v3.13)")\[ExecutorBase\]_, _log\_stats: [bool](https://docs.python.org/3/library/functions.html#bool "(in Python v3.13)")_, _usage\_context: UsageContext \= UsageContext.ENGINE\_CONTEXT_, _stat\_loggers: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict "(in Python v3.13)")\[[str](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.13)"), StatLoggerBase\] | [None](https://docs.python.org/3/library/constants.html#None "(in Python v3.13)") \= None_, _input\_registry: InputRegistry \= INPUT\_REGISTRY_, _mm\_registry: [MultiModalRegistry](https://docs.vllm.ai/en/v0.8.4_a/api/multimodal/registry.html#vllm.multimodal.registry.MultiModalRegistry "vllm.multimodal.registry.MultiModalRegistry") \= MULTIMODAL\_REGISTRY_, _use\_cached\_outputs: [bool](https://docs.python.org/3/library/functions.html#bool "(in Python v3.13)") \= False_)** diff --git a/docs/11-api/02-engine/02-async_llm_engine.md b/docs/11-api/02-engine/02-async_llm_engine.md index e5e7174..2569a7a 100644 --- a/docs/11-api/02-engine/02-async_llm_engine.md +++ b/docs/11-api/02-engine/02-async_llm_engine.md @@ -2,7 +2,7 @@ title: AsyncLLMEngine --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) **class_ vllm.AsyncLLMEngine(_\*args_, _log\_requests: [bool](https://docs.python.org/3/library/functions.html#bool "(in Python v3.13)") \= True_, _start\_engine\_loop: [bool](https://docs.python.org/3/library/functions.html#bool "(in Python v3.13)") \= True_, _\*\*kwargs_)** diff --git a/docs/11-api/02-engine/README.md b/docs/11-api/02-engine/README.md index d4ef42b..25959bf 100644 --- a/docs/11-api/02-engine/README.md +++ b/docs/11-api/02-engine/README.md @@ -2,14 +2,14 @@ title: vLLM 引擎 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) 引擎 -- [LLMEngine](https://docs.vllm.ai/en/latest/api/engine/llm_engine.html) +- [LLMEngine](/docs/api/engine/llm_engine) - `LLMEngine` -- [AsyncLLMEngine](https://docs.vllm.ai/en/latest/api/engine/async_llm_engine.html) +- [AsyncLLMEngine](/docs/api/engine/async_llm_engine) - `AsyncLLMEngine` diff --git a/docs/11-api/03-inference_params.md b/docs/11-api/03-inference_params.md index 2d4084a..595a788 100644 --- a/docs/11-api/03-inference_params.md +++ b/docs/11-api/03-inference_params.md @@ -2,7 +2,7 @@ title: 推理参数 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM API 的推理参数。 diff --git a/docs/11-api/04-multimodal/01-inputs.md b/docs/11-api/04-multimodal/01-inputs.md index 2dd745b..768c00b 100644 --- a/docs/11-api/04-multimodal/01-inputs.md +++ b/docs/11-api/04-multimodal/01-inputs.md @@ -2,7 +2,7 @@ title: 输入定义 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 面向用户的输入 diff --git a/docs/11-api/04-multimodal/02-parse.md b/docs/11-api/04-multimodal/02-parse.md index 835bb2b..119674e 100644 --- a/docs/11-api/04-multimodal/02-parse.md +++ b/docs/11-api/04-multimodal/02-parse.md @@ -2,7 +2,7 @@ title: 数据解析 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 模块内容 diff --git a/docs/11-api/04-multimodal/03-processing.md b/docs/11-api/04-multimodal/03-processing.md index d20617d..3a1e5ab 100644 --- a/docs/11-api/04-multimodal/03-processing.md +++ b/docs/11-api/04-multimodal/03-processing.md @@ -2,7 +2,7 @@ title: 数据处理 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 模块内容 diff --git a/docs/11-api/04-multimodal/04-profiling.md b/docs/11-api/04-multimodal/04-profiling.md index 1edadfe..55d9905 100644 --- a/docs/11-api/04-multimodal/04-profiling.md +++ b/docs/11-api/04-multimodal/04-profiling.md @@ -2,7 +2,7 @@ title: 内存分析 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 模块内容 diff --git a/docs/11-api/04-multimodal/05-registry.md b/docs/11-api/04-multimodal/05-registry.md index 87a8a14..49c204c 100644 --- a/docs/11-api/04-multimodal/05-registry.md +++ b/docs/11-api/04-multimodal/05-registry.md @@ -2,7 +2,7 @@ title: 注册表 (Registry) --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 模块内容 diff --git a/docs/11-api/04-multimodal/README.md b/docs/11-api/04-multimodal/README.md index 0b6b18e..7845d33 100644 --- a/docs/11-api/04-multimodal/README.md +++ b/docs/11-api/04-multimodal/README.md @@ -2,13 +2,13 @@ title: 多模态支持 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) vLLM 通过 `vllm.multimodal` 包提供对多模态模型的实验性支持。 -多模态输入可以与文本和 token 提示一起传递给[支持的模型](https://vllm.hyper.ai/docs/models/supported_models),通过 `vllm.inputs.PromptType` 中的 `multi_modal_data` 字段传递。 +多模态输入可以与文本和 token 提示一起传递给[支持的模型](/docs/models/supported_models),通过 `vllm.inputs.PromptType` 中的 `multi_modal_data` 字段传递。 -想要添加自己的多模态模型?请按照[此处](https://vllm.hyper.ai/docs/contributing/model/multimodal)列出的说明操作。 +想要添加自己的多模态模型?请按照[此处](/docs/contributing/model/multimodal)列出的说明操作。 ## 模块内容 @@ -18,12 +18,12 @@ vLLM 通过 `vllm.multimodal` 包提供对多模态模型的实验性支持。 全局的 `MultiModalRegistry` 被模型运行器用于根据目标模型分派数据处理。 -> **另请参阅** >[多模态数据处理](https://vllm.hyper.ai/docs/design/mm_processing) +> **另请参阅** >[多模态数据处理](/docs/design/mm_processing) ## 子模块 -- [输入定义](https://vllm.hyper.ai/docs/api/multimodal/inputs) -- [数据解析](https://vllm.hyper.ai/docs/api/multimodal/parse) -- [数据处理](https://vllm.hyper.ai/docs/api/multimodal/processing) -- [内存分析](https://vllm.hyper.ai/docs/api/multimodal/profiling) -- [注册表](https://vllm.hyper.ai/docs/api/multimodal/registry) +- [输入定义](/docs/api/multimodal/inputs) +- [数据解析](/docs/api/multimodal/parse) +- [数据处理](/docs/api/multimodal/processing) +- [内存分析](/docs/api/multimodal/profiling) +- [注册表](/docs/api/multimodal/registry) diff --git a/docs/11-api/05-model/01-interfaces_base.md b/docs/11-api/05-model/01-interfaces_base.md index 47906f3..03abbb2 100644 --- a/docs/11-api/05-model/01-interfaces_base.md +++ b/docs/11-api/05-model/01-interfaces_base.md @@ -2,7 +2,7 @@ title: 基本模型接口 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 模块内容 diff --git a/docs/11-api/05-model/02-interfaces.md b/docs/11-api/05-model/02-interfaces.md index 30ff725..36630b9 100644 --- a/docs/11-api/05-model/02-interfaces.md +++ b/docs/11-api/05-model/02-interfaces.md @@ -2,7 +2,7 @@ title: 可选接口 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 模块内容 diff --git a/docs/11-api/05-model/03-adapters.md b/docs/11-api/05-model/03-adapters.md index 75aa5f4..904c811 100644 --- a/docs/11-api/05-model/03-adapters.md +++ b/docs/11-api/05-model/03-adapters.md @@ -2,7 +2,7 @@ title: 模型适配器 --- -[\*在线运行 vLLM 入门教程:零基础分步指南](https://openbayes.com/console/public/tutorials/rXxb5fZFr29?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) +[\*在线运行 vLLM 入门教程:零基础分步指南](https://app.hyper.ai/console/public/tutorials/rUwYsyhAIt3?utm_source=vLLM-CNdoc&utm_medium=vLLM-CNdoc-V1&utm_campaign=vLLM-CNdoc-V1-25ap) ## 模块内容