From 70aa405f3cd58aeaa78b8dcb1ab8c5356b54a405 Mon Sep 17 00:00:00 2001 From: revital Date: Sun, 7 Dec 2025 09:06:58 +0200 Subject: [PATCH 1/5] Add Multi-GPU inference note in deployment apps --- docs/webapp/applications/apps_embed_model_deployment.md | 6 ++++++ docs/webapp/applications/apps_llama_deployment.md | 8 +++++++- docs/webapp/applications/apps_model_deployment.md | 6 ++++++ docs/webapp/applications/apps_sglang.md | 8 +++++++- 4 files changed, 26 insertions(+), 2 deletions(-) diff --git a/docs/webapp/applications/apps_embed_model_deployment.md b/docs/webapp/applications/apps_embed_model_deployment.md index 18fa43fc..f8865b1a 100644 --- a/docs/webapp/applications/apps_embed_model_deployment.md +++ b/docs/webapp/applications/apps_embed_model_deployment.md @@ -93,6 +93,12 @@ values from the file, which can be modified before launching the app instance * **Service Project** - ClearML Project where your Embedding Model Deployment app instance will be stored * **Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the Embedding Model Deployment app instance task will be enqueued (make sure an agent is assigned to it) + + :::tip Multi-GPU inference + To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + for an example configuration of a queue that allocates multiple GPUs and shared memory. + ::: + * **AI Gateway Route** - Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. * **Model Configuration** * Model - A ClearML Model ID or a Hugging Face model name (e.g. `openai-community/gpt2`) diff --git a/docs/webapp/applications/apps_llama_deployment.md b/docs/webapp/applications/apps_llama_deployment.md index 8f260477..f47d4e17 100644 --- a/docs/webapp/applications/apps_llama_deployment.md +++ b/docs/webapp/applications/apps_llama_deployment.md @@ -89,7 +89,13 @@ values from the file, which can be modified before launching the app instance project-level permissions (i.e. users with read access can use the app instance). * **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the llama.cpp Model Deployment app instance task will be enqueued (make sure an agent is assigned to it) -**AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. + + :::tip Multi-GPU inference + To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + for an example configuration of a queue that allocates multiple GPUs and shared memory. + ::: + +* **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. * **Model Configuration**: Configure the behavior and performance of the model serving engine. * CLI: Llama.cpp CLI arguments. If set, these arguments will be passed to Llama.cpp and all following entries will be ignored, except for the `Model` field. diff --git a/docs/webapp/applications/apps_model_deployment.md b/docs/webapp/applications/apps_model_deployment.md index d5f61e4f..cfa0212b 100644 --- a/docs/webapp/applications/apps_model_deployment.md +++ b/docs/webapp/applications/apps_model_deployment.md @@ -92,6 +92,12 @@ values from the file, which can be modified before launching the app instance project-level permissions (i.e. users with read access can use the app). * **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the vLLM Model Deployment app instance task will be enqueued (make sure an agent is assigned to that queue) + + :::tip Multi-GPU inference + To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + for an example configuration of a queue that allocates multiple GPUs and shared memory. + ::: + * **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. * **Model Configuration**: Configure the behavior and performance of the model engine. * Trust Remote Code: Select to set Hugging Face [`trust_remote_code`](https://huggingface.co/docs/text-generation-inference/main/en/reference/launcher#trustremotecode) diff --git a/docs/webapp/applications/apps_sglang.md b/docs/webapp/applications/apps_sglang.md index 7b7d7e04..9b20d29b 100644 --- a/docs/webapp/applications/apps_sglang.md +++ b/docs/webapp/applications/apps_sglang.md @@ -90,7 +90,13 @@ values from the file, which can be modified before launching the app instance * **Service Project - Access Control** - The ClearML project where the app instance is created. Access is determined by project-level permissions (i.e. users with read access can use the app). * **Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the SGLang Model Deployment app -instance task will be enqueued (make sure an agent is assigned to that queue) +instance task will be enqueued. Make sure an agent is assigned to that queue. + + :::tip Multi-GPU inference + To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + for an example configuration of a queue that allocates multiple GPUs and shared memory. + ::: + * **AI Gateway Route** - Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. * **Model** - A ClearML Model ID or a HuggingFace model name (e.g. `openai-community/gpt2`) * **Model Configuration**: Configure the behavior and performance of the language model engine. This allows you to From 99fa0f61d9d583886cc82cf3107c9db27318258d Mon Sep 17 00:00:00 2001 From: revital Date: Sun, 7 Dec 2025 09:06:58 +0200 Subject: [PATCH 2/5] Add Multi-GPU inference note in deployment apps --- docs/webapp/applications/apps_embed_model_deployment.md | 6 ++++++ docs/webapp/applications/apps_llama_deployment.md | 8 +++++++- docs/webapp/applications/apps_model_deployment.md | 6 ++++++ docs/webapp/applications/apps_sglang.md | 8 +++++++- 4 files changed, 26 insertions(+), 2 deletions(-) diff --git a/docs/webapp/applications/apps_embed_model_deployment.md b/docs/webapp/applications/apps_embed_model_deployment.md index 18fa43fc..f8865b1a 100644 --- a/docs/webapp/applications/apps_embed_model_deployment.md +++ b/docs/webapp/applications/apps_embed_model_deployment.md @@ -93,6 +93,12 @@ values from the file, which can be modified before launching the app instance * **Service Project** - ClearML Project where your Embedding Model Deployment app instance will be stored * **Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the Embedding Model Deployment app instance task will be enqueued (make sure an agent is assigned to it) + + :::tip Multi-GPU inference + To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + for an example configuration of a queue that allocates multiple GPUs and shared memory. + ::: + * **AI Gateway Route** - Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. * **Model Configuration** * Model - A ClearML Model ID or a Hugging Face model name (e.g. `openai-community/gpt2`) diff --git a/docs/webapp/applications/apps_llama_deployment.md b/docs/webapp/applications/apps_llama_deployment.md index 8f260477..f47d4e17 100644 --- a/docs/webapp/applications/apps_llama_deployment.md +++ b/docs/webapp/applications/apps_llama_deployment.md @@ -89,7 +89,13 @@ values from the file, which can be modified before launching the app instance project-level permissions (i.e. users with read access can use the app instance). * **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the llama.cpp Model Deployment app instance task will be enqueued (make sure an agent is assigned to it) -**AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. + + :::tip Multi-GPU inference + To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + for an example configuration of a queue that allocates multiple GPUs and shared memory. + ::: + +* **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. * **Model Configuration**: Configure the behavior and performance of the model serving engine. * CLI: Llama.cpp CLI arguments. If set, these arguments will be passed to Llama.cpp and all following entries will be ignored, except for the `Model` field. diff --git a/docs/webapp/applications/apps_model_deployment.md b/docs/webapp/applications/apps_model_deployment.md index d5f61e4f..cfa0212b 100644 --- a/docs/webapp/applications/apps_model_deployment.md +++ b/docs/webapp/applications/apps_model_deployment.md @@ -92,6 +92,12 @@ values from the file, which can be modified before launching the app instance project-level permissions (i.e. users with read access can use the app). * **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the vLLM Model Deployment app instance task will be enqueued (make sure an agent is assigned to that queue) + + :::tip Multi-GPU inference + To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + for an example configuration of a queue that allocates multiple GPUs and shared memory. + ::: + * **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. * **Model Configuration**: Configure the behavior and performance of the model engine. * Trust Remote Code: Select to set Hugging Face [`trust_remote_code`](https://huggingface.co/docs/text-generation-inference/main/en/reference/launcher#trustremotecode) diff --git a/docs/webapp/applications/apps_sglang.md b/docs/webapp/applications/apps_sglang.md index 7b7d7e04..9b20d29b 100644 --- a/docs/webapp/applications/apps_sglang.md +++ b/docs/webapp/applications/apps_sglang.md @@ -90,7 +90,13 @@ values from the file, which can be modified before launching the app instance * **Service Project - Access Control** - The ClearML project where the app instance is created. Access is determined by project-level permissions (i.e. users with read access can use the app). * **Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the SGLang Model Deployment app -instance task will be enqueued (make sure an agent is assigned to that queue) +instance task will be enqueued. Make sure an agent is assigned to that queue. + + :::tip Multi-GPU inference + To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + for an example configuration of a queue that allocates multiple GPUs and shared memory. + ::: + * **AI Gateway Route** - Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. * **Model** - A ClearML Model ID or a HuggingFace model name (e.g. `openai-community/gpt2`) * **Model Configuration**: Configure the behavior and performance of the language model engine. This allows you to From a86a67e14c06cda7e3ca09eb25bd11f4c026e081 Mon Sep 17 00:00:00 2001 From: revital Date: Tue, 9 Dec 2025 14:42:19 +0200 Subject: [PATCH 3/5] Add Multi-GPU inference note in deployment apps --- docs/webapp/applications/apps_embed_model_deployment.md | 4 ++-- docs/webapp/applications/apps_llama_deployment.md | 4 ++-- docs/webapp/applications/apps_model_deployment.md | 4 ++-- docs/webapp/applications/apps_sglang.md | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/webapp/applications/apps_embed_model_deployment.md b/docs/webapp/applications/apps_embed_model_deployment.md index f8865b1a..2ed6a51e 100644 --- a/docs/webapp/applications/apps_embed_model_deployment.md +++ b/docs/webapp/applications/apps_embed_model_deployment.md @@ -95,8 +95,8 @@ values from the file, which can be modified before launching the app instance Deployment app instance task will be enqueued (make sure an agent is assigned to it) :::tip Multi-GPU inference - To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) - for an example configuration of a queue that allocates multiple GPUs and shared memory. + To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. + See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) for an example. ::: * **AI Gateway Route** - Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. diff --git a/docs/webapp/applications/apps_llama_deployment.md b/docs/webapp/applications/apps_llama_deployment.md index f47d4e17..4c5cad74 100644 --- a/docs/webapp/applications/apps_llama_deployment.md +++ b/docs/webapp/applications/apps_llama_deployment.md @@ -91,8 +91,8 @@ values from the file, which can be modified before launching the app instance llama.cpp Model Deployment app instance task will be enqueued (make sure an agent is assigned to it) :::tip Multi-GPU inference - To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) - for an example configuration of a queue that allocates multiple GPUs and shared memory. + To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. + See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) for an example. ::: * **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. diff --git a/docs/webapp/applications/apps_model_deployment.md b/docs/webapp/applications/apps_model_deployment.md index cfa0212b..c43a5153 100644 --- a/docs/webapp/applications/apps_model_deployment.md +++ b/docs/webapp/applications/apps_model_deployment.md @@ -94,8 +94,8 @@ values from the file, which can be modified before launching the app instance instance task will be enqueued (make sure an agent is assigned to that queue) :::tip Multi-GPU inference - To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) - for an example configuration of a queue that allocates multiple GPUs and shared memory. + To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. + See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) for an example. ::: * **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. diff --git a/docs/webapp/applications/apps_sglang.md b/docs/webapp/applications/apps_sglang.md index 9b20d29b..41139ce6 100644 --- a/docs/webapp/applications/apps_sglang.md +++ b/docs/webapp/applications/apps_sglang.md @@ -93,8 +93,8 @@ values from the file, which can be modified before launching the app instance instance task will be enqueued. Make sure an agent is assigned to that queue. :::tip Multi-GPU inference - To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) - for an example configuration of a queue that allocates multiple GPUs and shared memory. + To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. + See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) for an example. ::: * **AI Gateway Route** - Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. From 7b3bb339b93d34300bdf588c439a44cb931018c6 Mon Sep 17 00:00:00 2001 From: revital Date: Tue, 9 Dec 2025 14:46:39 +0200 Subject: [PATCH 4/5] Add Multi-GPU inference note in deployment apps --- docs/webapp/applications/apps_embed_model_deployment.md | 2 +- docs/webapp/applications/apps_llama_deployment.md | 2 +- docs/webapp/applications/apps_model_deployment.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/webapp/applications/apps_embed_model_deployment.md b/docs/webapp/applications/apps_embed_model_deployment.md index 60cb95d9..4a474424 100644 --- a/docs/webapp/applications/apps_embed_model_deployment.md +++ b/docs/webapp/applications/apps_embed_model_deployment.md @@ -92,7 +92,7 @@ values from the file, which can be modified before launching the app instance * **Instance name** - Name for the Embedding Model Deployment instance. This will appear in the instance list * **Service Project** - ClearML Project where your Embedding Model Deployment app instance will be stored * **Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the Embedding Model -Deployment app instance task will be enqueued (make sure an agent is assigned to it) +Deployment app instance task will be enqueued. Make sure an agent is assigned to that queue. :::tip Multi-GPU inference To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) diff --git a/docs/webapp/applications/apps_llama_deployment.md b/docs/webapp/applications/apps_llama_deployment.md index 2c89e91d..da05951b 100644 --- a/docs/webapp/applications/apps_llama_deployment.md +++ b/docs/webapp/applications/apps_llama_deployment.md @@ -88,7 +88,7 @@ values from the file, which can be modified before launching the app instance * **Service Project (Access Control)**: The ClearML project where the app instance is created. Access is determined by project-level permissions (i.e. users with read access can use the app instance). * **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the - llama.cpp Model Deployment app instance task will be enqueued (make sure an agent is assigned to it) + llama.cpp Model Deployment app instance task will be enqueued. Make sure an agent is assigned to that queue. :::tip Multi-GPU inference To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) diff --git a/docs/webapp/applications/apps_model_deployment.md b/docs/webapp/applications/apps_model_deployment.md index 7178d8ad..ca3e6aa6 100644 --- a/docs/webapp/applications/apps_model_deployment.md +++ b/docs/webapp/applications/apps_model_deployment.md @@ -91,7 +91,7 @@ values from the file, which can be modified before launching the app instance * **Service Project (Access Control)**: The ClearML project where the app instance is created. Access is determined by project-level permissions (i.e. users with read access can use the app). * **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the vLLM Model Deployment app -instance task will be enqueued (make sure an agent is assigned to that queue) +instance task will be enqueued. Make sure an agent is assigned to that queue. :::tip Multi-GPU inference To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) From f8d51642717e316016e766b5544507e17be9c2d1 Mon Sep 17 00:00:00 2001 From: revital Date: Wed, 10 Dec 2025 07:24:36 +0200 Subject: [PATCH 5/5] Edit --- docs/webapp/applications/apps_embed_model_deployment.md | 2 +- docs/webapp/applications/apps_llama_deployment.md | 2 +- docs/webapp/applications/apps_model_deployment.md | 2 +- docs/webapp/applications/apps_sglang.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/webapp/applications/apps_embed_model_deployment.md b/docs/webapp/applications/apps_embed_model_deployment.md index 4a474424..d4df943e 100644 --- a/docs/webapp/applications/apps_embed_model_deployment.md +++ b/docs/webapp/applications/apps_embed_model_deployment.md @@ -95,7 +95,7 @@ values from the file, which can be modified before launching the app instance Deployment app instance task will be enqueued. Make sure an agent is assigned to that queue. :::tip Multi-GPU inference - To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) for an example configuration of a queue that allocates multiple GPUs and shared memory. ::: diff --git a/docs/webapp/applications/apps_llama_deployment.md b/docs/webapp/applications/apps_llama_deployment.md index da05951b..bf001837 100644 --- a/docs/webapp/applications/apps_llama_deployment.md +++ b/docs/webapp/applications/apps_llama_deployment.md @@ -91,7 +91,7 @@ values from the file, which can be modified before launching the app instance llama.cpp Model Deployment app instance task will be enqueued. Make sure an agent is assigned to that queue. :::tip Multi-GPU inference - To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) for an example configuration of a queue that allocates multiple GPUs and shared memory. ::: diff --git a/docs/webapp/applications/apps_model_deployment.md b/docs/webapp/applications/apps_model_deployment.md index ca3e6aa6..7bd66c81 100644 --- a/docs/webapp/applications/apps_model_deployment.md +++ b/docs/webapp/applications/apps_model_deployment.md @@ -94,7 +94,7 @@ values from the file, which can be modified before launching the app instance instance task will be enqueued. Make sure an agent is assigned to that queue. :::tip Multi-GPU inference - To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) for an example configuration of a queue that allocates multiple GPUs and shared memory. ::: diff --git a/docs/webapp/applications/apps_sglang.md b/docs/webapp/applications/apps_sglang.md index 75182bf3..89600b3f 100644 --- a/docs/webapp/applications/apps_sglang.md +++ b/docs/webapp/applications/apps_sglang.md @@ -93,7 +93,7 @@ values from the file, which can be modified before launching the app instance instance task will be enqueued. Make sure an agent is assigned to that queue. :::tip Multi-GPU inference - To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) + To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory) for an example configuration of a queue that allocates multiple GPUs and shared memory. :::