Skip to content

Conversation

@atobiszei
Copy link
Collaborator

@atobiszei atobiszei commented Sep 1, 2025

Ticket:CVS-171412

@atobiszei atobiszei added this to the 2025.3 milestone Sep 1, 2025
@atobiszei atobiszei added the WIP Do not merge until resolved label Sep 1, 2025
@dtrawins dtrawins removed the WIP Do not merge until resolved label Sep 3, 2025
@atobiszei atobiszei added the WIP Do not merge until resolved label Sep 3, 2025
@atobiszei atobiszei force-pushed the atobisze_GGUF_docs branch 3 times, most recently from 82d4a22 to 4b30914 Compare September 4, 2025 12:28
@atobiszei atobiszei removed the WIP Do not merge until resolved label Sep 5, 2025
| `--target_device` | `string` | Device name to be used to execute inference operations. Accepted values are: `"CPU"/"GPU"/"MULTI"/"HETERO"` |
| `--task` | `string` | Task type the model will support (`text_generation`, `embeddings`, `rerank`, `image_generation`). |
| `--overwrite_models` | `NA` | If set, an existing model with the same name will be overwritten. If not set, the server will use existing model files if available. |
| `--gguf_filename` | `string` | Filename of the wanted quantization type from Hugging Face repository. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filename of the wanted quantization type from Hugging Face GGUF model repository.

:::
::::

*Note:* GGUF format models is only supported with `--task text_generation`. For list of supported models check [blog](https://blog.openvino.ai/blog-posts/openvino-genai-supports-gguf-models).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GGUF format model is only supported with

Copy link
Collaborator

@rasapala rasapala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.


> **NOTE**: This is experimental feature and issues in accuracy of models may be observed.
> **NOTE:** Model downloading feature is described in depth in separate documentation page: [Pulling HuggingFaces Models](../../docs/pull_hf_models.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this note could be added as a reference at the bottom. Here were should include info about supported models and include info that this is limited to text generation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add also a limitation that single file only is supported.

@atobiszei atobiszei requested a review from dtrawins September 11, 2025 11:00

This demo shows how to deploy model with the OpenVINO Model Server.

Currently supported models are DeepSeek-R1-Distill-Qwen (1.5B, 7B), Qwen2.5 Instruct (1.5B, 3B, 7B) & llama-3.2 Instruct (1B, 3B, 8B).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a complete list of models. We should include this https://blog.openvino.ai/blog-posts/openvino-genai-supports-gguf-models


If the model already exists locally, it will skip the downloading and immediately start the serving.

> **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter.
> **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter and remove --rest_port.


> **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter.
Start with deploying the model:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Start with deploying the model:
Deploy the model:

--task text_generation ^
--source_model "Qwen/Qwen2.5-3B-Instruct-GGUF" ^
--gguf_filename qwen2.5-3b-instruct-q4_k_m.gguf ^
--model_name LLM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--model_name LLM
--model_name Qwen/Qwen2.5-3B-Instruct


```text
curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" \
-d '{"model": "LLM", \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-d '{"model": "LLM", \
-d '{"model": "Qwen/Qwen2.5-3B-Instruct", \

**Required:** Docker Engine installed

```text
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf


```text
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS]
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]

}
],
"created": 1756986130,
"model": "LLM",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"model": "LLM",
"model": "Qwen/Qwen2.5-3B-Instruct",

--task text_generation \
--source_model "Qwen/Qwen2.5-3B-Instruct-GGUF" \
--gguf_filename qwen2.5-3b-instruct-q4_k_m.gguf \
--model_name LLM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--model_name LLM
--model_name Qwen/Qwen2.5-3B-Instruct

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update introduction

@atobiszei atobiszei requested a review from dtrawins September 29, 2025 14:36
```text
curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" \
-d '{"model": "Qwen/Qwen2.5-3B-Instruct", \
"max_tokens":300, \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_tokens is not needed this such use question...

Then send a request to the model:

```text
curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curl -s to drop transfer statisitcs


```text
ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS]
ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a note how to handle split files

@atobiszei atobiszei requested review from Copilot and dtrawins October 2, 2025 09:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds documentation for GGUF (GGML Universal File Format) model support in OpenVINO Model Server, including new command-line parameters and usage examples.

  • Adds --gguf_filename parameter to specify quantization files from Hugging Face GGUF repositories
  • Updates Docker image references from latest to weekly tag
  • Creates comprehensive GGUF demo documentation with deployment examples

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
docs/pull_hf_models.md Updates command examples with new --gguf_filename parameter and adds GGUF-specific usage examples
docs/parameters.md Documents the new --gguf_filename parameter in the configuration options table
demos/gguf/README.md New comprehensive demo showing GGUF model deployment with Docker and bare metal examples
demos/README.md Adds GGUF demo to the main demos index and table

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


```text
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS]
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]
Copy link

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The command line is very long and difficult to read. Consider adding line breaks with backslashes for better readability, similar to the examples shown later in the file.

Suggested change
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]
docker run $(id -u):$(id -g) --rm \
-v <model_repository_path>:/models:rw \
openvino/model_server:weekly \
--pull \
--source_model <model_name_in_HF> \
--model_repository_path /models \
--model_name <external_model_name> \
--target_device <DEVICE> \
[--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] \
--task <task> \
[TASK_SPECIFIC_PARAMETERS]

Copilot uses AI. Check for mistakes.

```text
ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS]
ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]
Copy link

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Similar to the Docker command, this command line is very long and difficult to read. Consider adding line breaks with backslashes for better readability.

Copilot uses AI. Check for mistakes.
**Required:** Docker Engine installed

```text
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf
Copy link

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This command line is extremely long and difficult to read. It should be broken into multiple lines with backslashes for better readability, consistent with other examples in the documentation.

Suggested change
docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf
docker run $(id -u):$(id -g) --rm \
-v <model_repository_path>:/models:rw \
openvino/model_server:weekly \
--pull \
--source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" \
--model_repository_path /models \
--model_name unsloth/Llama-3.2-1B-Instruct-GGUF \
--task text_generation \
--gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf

Copilot uses AI. Check for mistakes.
:sync: baremetal
**Required:** OpenVINO Model Server package - see [deployment instructions](./deploying_server_baremetal.md) for details.
```text
ovms --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf
Copy link

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This command line is very long and difficult to read. It should be broken into multiple lines with backslashes for better readability.

Suggested change
ovms --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf
ovms --pull \
--source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" \
--model_repository_path /models \
--model_name unsloth/Llama-3.2-1B-Instruct-GGUF \
--task text_generation \
--gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf

Copilot uses AI. Check for mistakes.
@dtrawins dtrawins merged commit 274de01 into main Oct 6, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants