Skip to content

Commit

Permalink
Docs: improve link to docs (#3860)
Browse files Browse the repository at this point in the history
Co-authored-by: Chayenne <zhaochen20@outlook.com>
  • Loading branch information
simveit and zhaochenyang20 authored Feb 26, 2025
1 parent c9fc4a9 commit 44a2c4b
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 13 deletions.
6 changes: 3 additions & 3 deletions docs/backend/openai_api_vision.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"\n",
"Launch the server in your terminal and wait for it to initialize.\n",
"\n",
"**Remember to add `--chat-template llama_3_vision` to specify the vision chat template, otherwise the server only supports text, and performance degradation may occur.**\n",
"**Remember to add** `--chat-template llama_3_vision` **to specify the vision chat template, otherwise the server only supports text, and performance degradation may occur.**\n",
"\n",
"We need to specify `--chat-template` for vision language models because the chat template provided in Hugging Face tokenizer only supports text."
]
Expand All @@ -46,7 +46,7 @@
"\n",
"from sglang.utils import wait_for_server, print_highlight, terminate_process\n",
"\n",
"embedding_process, port = launch_server_cmd(\n",
"vision_process, port = launch_server_cmd(\n",
" \"\"\"\n",
"python3 -m sglang.launch_server --model-path meta-llama/Llama-3.2-11B-Vision-Instruct \\\n",
" --chat-template=llama_3_vision\n",
Expand Down Expand Up @@ -245,7 +245,7 @@
"metadata": {},
"outputs": [],
"source": [
"terminate_process(embedding_process)"
"terminate_process(vision_process)"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/backend/server_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Please consult the documentation below to learn more about the parameters you ma
* `chat_template`: The chat template to use. Deviating from the default might lead to unexpected responses. For multi-modal chat templates, refer to [here](https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template).
* `is_embedding`: Set to true to perform [embedding](https://docs.sglang.ai/backend/openai_api_embeddings.html) / [encode](https://docs.sglang.ai/backend/native_api.html#Encode-(embedding-model)) and [reward](https://docs.sglang.ai/backend/native_api.html#Classify-(reward-model)) tasks.
* `revision`: Adjust if a specific version of the model should be used.
* `skip_tokenizer_init`: Set to true to provide the tokens to the engine and get the output tokens directly, typically used in RLHF.
* `skip_tokenizer_init`: Set to true to provide the tokens to the engine and get the output tokens directly, typically used in RLHF. Please see this [example for reference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/input_ids.py).
* `json_model_override_args`: Override model config with the provided JSON.
* `delete_ckpt_after_loading`: Delete the model checkpoint after loading the model.

Expand Down
23 changes: 14 additions & 9 deletions docs/router/router.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,13 @@ The router supports two working modes:
This will be a drop-in replacement for the existing `--dp-size` argument of SGLang Runtime. Under the hood, it uses multi-processes to launch multiple workers, wait for them to be ready, then connect the router to all workers.

```bash
$ python -m sglang_router.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --dp-size 1
python -m sglang_router.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --dp-size 4
```

After the server is ready, you can directly send requests to the router as the same way as sending requests to each single worker.

Please adjust the batchsize accordingly to archieve maximum throughput.

```python
import requests

Expand All @@ -47,7 +49,7 @@ print(response.json())
This is useful for multi-node DP. First, launch workers on multiple nodes, then launch a router on the main node, and connect the router to all workers.

```bash
$ python -m sglang_router.launch_router --worker-urls http://worker_url_1 http://worker_url_2
python -m sglang_router.launch_router --worker-urls http://worker_url_1 http://worker_url_2
```

## Dynamic Scaling APIs
Expand All @@ -59,30 +61,33 @@ We offer `/add_worker` and `/remove_worker` APIs to dynamically add or remove wo
Usage:

```bash
$ curl -X POST http://localhost:30000/add_worker?url=http://worker_url_1
curl -X POST http://localhost:30000/add_worker?url=http://worker_url_1
```

Example:

```bash
$ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30001
$ curl -X POST http://localhost:30000/add_worker?url=http://127.0.0.1:30001
Successfully added worker: http://127.0.0.1:30001
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30001

curl -X POST http://localhost:30000/add_worker?url=http://127.0.0.1:30001

# Successfully added worker: http://127.0.0.1:30001
```

- `/remove_worker`

Usage:

```bash
$ curl -X POST http://localhost:30000/remove_worker?url=http://worker_url_1
curl -X POST http://localhost:30000/remove_worker?url=http://worker_url_1
```

Example:

```bash
$ curl -X POST http://localhost:30000/remove_worker?url=http://127.0.0.1:30001
Successfully removed worker: http://127.0.0.1:30001
curl -X POST http://localhost:30000/remove_worker?url=http://127.0.0.1:30001

# Successfully removed worker: http://127.0.0.1:30001
```

Note:
Expand Down

0 comments on commit 44a2c4b

Please sign in to comment.