-
Notifications
You must be signed in to change notification settings - Fork 231
GGUF pull docs #3616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GGUF pull docs #3616
Conversation
82d4a22 to
4b30914
Compare
docs/parameters.md
Outdated
| | `--target_device` | `string` | Device name to be used to execute inference operations. Accepted values are: `"CPU"/"GPU"/"MULTI"/"HETERO"` | | ||
| | `--task` | `string` | Task type the model will support (`text_generation`, `embeddings`, `rerank`, `image_generation`). | | ||
| | `--overwrite_models` | `NA` | If set, an existing model with the same name will be overwritten. If not set, the server will use existing model files if available. | | ||
| | `--gguf_filename` | `string` | Filename of the wanted quantization type from Hugging Face repository. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filename of the wanted quantization type from Hugging Face GGUF model repository.
docs/pull_hf_models.md
Outdated
| ::: | ||
| :::: | ||
|
|
||
| *Note:* GGUF format models is only supported with `--task text_generation`. For list of supported models check [blog](https://blog.openvino.ai/blog-posts/openvino-genai-supports-gguf-models). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGUF format model is only supported with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments.
demos/gguf/README.md
Outdated
|
|
||
| > **NOTE**: This is experimental feature and issues in accuracy of models may be observed. | ||
| > **NOTE:** Model downloading feature is described in depth in separate documentation page: [Pulling HuggingFaces Models](../../docs/pull_hf_models.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this note could be added as a reference at the bottom. Here were should include info about supported models and include info that this is limited to text generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add also a limitation that single file only is supported.
demos/gguf/README.md
Outdated
|
|
||
| This demo shows how to deploy model with the OpenVINO Model Server. | ||
|
|
||
| Currently supported models are DeepSeek-R1-Distill-Qwen (1.5B, 7B), Qwen2.5 Instruct (1.5B, 3B, 7B) & llama-3.2 Instruct (1B, 3B, 8B). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a complete list of models. We should include this https://blog.openvino.ai/blog-posts/openvino-genai-supports-gguf-models
demos/gguf/README.md
Outdated
|
|
||
| If the model already exists locally, it will skip the downloading and immediately start the serving. | ||
|
|
||
| > **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter. | |
| > **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter and remove --rest_port. |
demos/gguf/README.md
Outdated
|
|
||
| > **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter. | ||
| Start with deploying the model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Start with deploying the model: | |
| Deploy the model: |
demos/gguf/README.md
Outdated
| --task text_generation ^ | ||
| --source_model "Qwen/Qwen2.5-3B-Instruct-GGUF" ^ | ||
| --gguf_filename qwen2.5-3b-instruct-q4_k_m.gguf ^ | ||
| --model_name LLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| --model_name LLM | |
| --model_name Qwen/Qwen2.5-3B-Instruct |
demos/gguf/README.md
Outdated
|
|
||
| ```text | ||
| curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" \ | ||
| -d '{"model": "LLM", \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| -d '{"model": "LLM", \ | |
| -d '{"model": "Qwen/Qwen2.5-3B-Instruct", \ |
docs/pull_hf_models.md
Outdated
| **Required:** Docker Engine installed | ||
|
|
||
| ```text | ||
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf | |
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf |
docs/pull_hf_models.md
Outdated
|
|
||
| ```text | ||
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS] | ||
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS] | |
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS] |
demos/gguf/README.md
Outdated
| } | ||
| ], | ||
| "created": 1756986130, | ||
| "model": "LLM", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "model": "LLM", | |
| "model": "Qwen/Qwen2.5-3B-Instruct", |
demos/gguf/README.md
Outdated
| --task text_generation \ | ||
| --source_model "Qwen/Qwen2.5-3B-Instruct-GGUF" \ | ||
| --gguf_filename qwen2.5-3b-instruct-q4_k_m.gguf \ | ||
| --model_name LLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| --model_name LLM | |
| --model_name Qwen/Qwen2.5-3B-Instruct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update introduction
demos/gguf/README.md
Outdated
| ```text | ||
| curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" \ | ||
| -d '{"model": "Qwen/Qwen2.5-3B-Instruct", \ | ||
| "max_tokens":300, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_tokens is not needed this such use question...
demos/gguf/README.md
Outdated
| Then send a request to the model: | ||
|
|
||
| ```text | ||
| curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curl -s to drop transfer statisitcs
|
|
||
| ```text | ||
| ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS] | ||
| ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a note how to handle split files
7246ffe to
b0adc55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds documentation for GGUF (GGML Universal File Format) model support in OpenVINO Model Server, including new command-line parameters and usage examples.
- Adds
--gguf_filenameparameter to specify quantization files from Hugging Face GGUF repositories - Updates Docker image references from
latesttoweeklytag - Creates comprehensive GGUF demo documentation with deployment examples
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| docs/pull_hf_models.md | Updates command examples with new --gguf_filename parameter and adds GGUF-specific usage examples |
| docs/parameters.md | Documents the new --gguf_filename parameter in the configuration options table |
| demos/gguf/README.md | New comprehensive demo showing GGUF model deployment with Docker and bare metal examples |
| demos/README.md | Adds GGUF demo to the main demos index and table |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| ```text | ||
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS] | ||
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS] |
Copilot
AI
Oct 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The command line is very long and difficult to read. Consider adding line breaks with backslashes for better readability, similar to the examples shown later in the file.
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS] | |
| docker run $(id -u):$(id -g) --rm \ | |
| -v <model_repository_path>:/models:rw \ | |
| openvino/model_server:weekly \ | |
| --pull \ | |
| --source_model <model_name_in_HF> \ | |
| --model_repository_path /models \ | |
| --model_name <external_model_name> \ | |
| --target_device <DEVICE> \ | |
| [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] \ | |
| --task <task> \ | |
| [TASK_SPECIFIC_PARAMETERS] |
|
|
||
| ```text | ||
| ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS] | ||
| ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS] |
Copilot
AI
Oct 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Similar to the Docker command, this command line is very long and difficult to read. Consider adding line breaks with backslashes for better readability.
| **Required:** Docker Engine installed | ||
|
|
||
| ```text | ||
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf |
Copilot
AI
Oct 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This command line is extremely long and difficult to read. It should be broken into multiple lines with backslashes for better readability, consistent with other examples in the documentation.
| docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf | |
| docker run $(id -u):$(id -g) --rm \ | |
| -v <model_repository_path>:/models:rw \ | |
| openvino/model_server:weekly \ | |
| --pull \ | |
| --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" \ | |
| --model_repository_path /models \ | |
| --model_name unsloth/Llama-3.2-1B-Instruct-GGUF \ | |
| --task text_generation \ | |
| --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf |
| :sync: baremetal | ||
| **Required:** OpenVINO Model Server package - see [deployment instructions](./deploying_server_baremetal.md) for details. | ||
| ```text | ||
| ovms --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf |
Copilot
AI
Oct 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This command line is very long and difficult to read. It should be broken into multiple lines with backslashes for better readability.
| ovms --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf | |
| ovms --pull \ | |
| --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" \ | |
| --model_repository_path /models \ | |
| --model_name unsloth/Llama-3.2-1B-Instruct-GGUF \ | |
| --task text_generation \ | |
| --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf |
Ticket:CVS-171412