GGUF pull docs #3616

atobiszei · 2025-09-01T05:47:26Z

Ticket:CVS-171412

rasapala · 2025-09-05T07:19:19Z

docs/parameters.md

 | `--target_device`           | `string`     | Device name to be used to execute inference operations. Accepted values are: `"CPU"/"GPU"/"MULTI"/"HETERO"`   |
 | `--task`                    | `string`     | Task type the model will support (`text_generation`, `embeddings`, `rerank`, `image_generation`).              |
 | `--overwrite_models`        | `NA`         | If set, an existing model with the same name will be overwritten. If not set, the server will use existing model files if available. |
+| `--gguf_filename`           | `string`     | Filename of the wanted quantization type from Hugging Face repository.                                        |


Filename of the wanted quantization type from Hugging Face GGUF model repository.

rasapala · 2025-09-05T07:19:52Z

docs/pull_hf_models.md

 :::
 ::::

+*Note:* GGUF format models is only supported with `--task text_generation`. For list of supported models check [blog](https://blog.openvino.ai/blog-posts/openvino-genai-supports-gguf-models).


GGUF format model is only supported with

rasapala

Minor comments.

dtrawins · 2025-09-05T11:31:44Z

demos/gguf/README.md

+
+> **NOTE**: This is experimental feature and issues in accuracy of models may be observed.
+
+> **NOTE:** Model downloading feature is described in depth in separate documentation page: [Pulling HuggingFaces Models](../../docs/pull_hf_models.md).


this note could be added as a reference at the bottom. Here were should include info about supported models and include info that this is limited to text generation.

add also a limitation that single file only is supported.

dtrawins · 2025-09-16T09:39:16Z

demos/gguf/README.md

+
+This demo shows how to deploy  model with the OpenVINO Model Server.
+
+Currently supported models are DeepSeek-R1-Distill-Qwen (1.5B, 7B), Qwen2.5 Instruct (1.5B, 3B, 7B) & llama-3.2 Instruct (1B, 3B, 8B).


this is not a complete list of models. We should include this https://blog.openvino.ai/blog-posts/openvino-genai-supports-gguf-models

dtrawins · 2025-09-16T09:40:16Z

demos/gguf/README.md

+
+If the model already exists locally, it will skip the downloading and immediately start the serving.
+
+> **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter.


Suggested change

> **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter.

> **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter and remove --rest_port.

dtrawins · 2025-09-16T09:40:48Z

demos/gguf/README.md

+
+> **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter.
+
+Start with deploying the model:


Suggested change

Start with deploying the model:

Deploy the model:

dtrawins · 2025-09-16T09:44:10Z

demos/gguf/README.md

+  --task text_generation ^
+  --source_model "Qwen/Qwen2.5-3B-Instruct-GGUF" ^
+  --gguf_filename qwen2.5-3b-instruct-q4_k_m.gguf ^
+  --model_name LLM


Suggested change

--model_name LLM

--model_name Qwen/Qwen2.5-3B-Instruct

dtrawins · 2025-09-16T09:44:46Z

demos/gguf/README.md

+
+```text
+curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" \
+                                               -d '{"model": "LLM", \


Suggested change

-d '{"model": "LLM", \

-d '{"model": "Qwen/Qwen2.5-3B-Instruct", \

dtrawins · 2025-09-17T22:11:09Z

docs/pull_hf_models.md

+**Required:** Docker Engine installed
+
+```text
+docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf


Suggested change

docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf

docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf

dtrawins · 2025-09-17T22:11:49Z

docs/pull_hf_models.md


 ```text
-docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS]
+docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]


Suggested change

docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]

docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]

dtrawins · 2025-09-17T22:13:01Z

demos/gguf/README.md

+    }
+  ],
+  "created": 1756986130,
+  "model": "LLM",


Suggested change

"model": "LLM",

"model": "Qwen/Qwen2.5-3B-Instruct",

dtrawins · 2025-09-17T22:13:29Z

demos/gguf/README.md

+    --task text_generation \
+    --source_model "Qwen/Qwen2.5-3B-Instruct-GGUF" \
+    --gguf_filename qwen2.5-3b-instruct-q4_k_m.gguf \
+    --model_name LLM


Suggested change

--model_name LLM

--model_name Qwen/Qwen2.5-3B-Instruct

dtrawins · 2025-09-17T22:15:29Z

docs/pull_hf_models.md

update introduction

dtrawins · 2025-10-01T10:22:13Z

demos/gguf/README.md

+```text
+curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" \
+                                               -d '{"model": "Qwen/Qwen2.5-3B-Instruct", \
+                                                    "max_tokens":300, \


max_tokens is not needed this such use question...

dtrawins · 2025-10-01T10:22:55Z

demos/gguf/README.md

+Then send a request to the model:
+
+```text
+curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" \


curl -s to drop transfer statisitcs

dtrawins · 2025-10-01T10:25:27Z

docs/pull_hf_models.md


 ```text
-ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS]
+ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]


add a note how to handle split files

Copilot

Pull Request Overview

This PR adds documentation for GGUF (GGML Universal File Format) model support in OpenVINO Model Server, including new command-line parameters and usage examples.

Adds --gguf_filename parameter to specify quantization files from Hugging Face GGUF repositories
Updates Docker image references from latest to weekly tag
Creates comprehensive GGUF demo documentation with deployment examples

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
docs/pull_hf_models.md	Updates command examples with new `--gguf_filename` parameter and adds GGUF-specific usage examples
docs/parameters.md	Documents the new `--gguf_filename` parameter in the configuration options table
demos/gguf/README.md	New comprehensive demo showing GGUF model deployment with Docker and bare metal examples
demos/README.md	Adds GGUF demo to the main demos index and table

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-02T15:24:43Z

docs/pull_hf_models.md


 ```text
-docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS]
+docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]


[nitpick] The command line is very long and difficult to read. Consider adding line breaks with backslashes for better readability, similar to the examples shown later in the file.

Suggested change

docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]

docker run $(id -u):$(id -g) --rm \

-v <model_repository_path>:/models:rw \

openvino/model_server:weekly \

--pull \

--source_model <model_name_in_HF> \

--model_repository_path /models \

--model_name <external_model_name> \

--target_device <DEVICE> \

[--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] \

--task <task> \

[TASK_SPECIFIC_PARAMETERS]

Copilot · 2025-10-02T15:24:43Z

docs/pull_hf_models.md


 ```text
-ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS]
+ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]


[nitpick] Similar to the Docker command, this command line is very long and difficult to read. Consider adding line breaks with backslashes for better readability.

Copilot · 2025-10-02T15:24:43Z

docs/pull_hf_models.md

+**Required:** Docker Engine installed
+
+```text
+docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf


[nitpick] This command line is extremely long and difficult to read. It should be broken into multiple lines with backslashes for better readability, consistent with other examples in the documentation.

Suggested change

docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf

docker run $(id -u):$(id -g) --rm \

-v <model_repository_path>:/models:rw \

openvino/model_server:weekly \

--pull \

--source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" \

--model_repository_path /models \

--model_name unsloth/Llama-3.2-1B-Instruct-GGUF \

--task text_generation \

--gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf

Copilot · 2025-10-02T15:24:44Z

docs/pull_hf_models.md

+:sync: baremetal
+**Required:** OpenVINO Model Server package - see [deployment instructions](./deploying_server_baremetal.md) for details.
+```text
+ovms --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf


[nitpick] This command line is very long and difficult to read. It should be broken into multiple lines with backslashes for better readability.

Suggested change

ovms --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf

ovms --pull \

--source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" \

--model_repository_path /models \

--model_name unsloth/Llama-3.2-1B-Instruct-GGUF \

--task text_generation \

--gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf

atobiszei added this to the 2025.3 milestone Sep 1, 2025

atobiszei added the WIP Do not merge until resolved label Sep 1, 2025

dtrawins removed the WIP Do not merge until resolved label Sep 3, 2025

atobiszei added the WIP Do not merge until resolved label Sep 3, 2025

atobiszei force-pushed the atobisze_GGUF_docs branch 3 times, most recently from 82d4a22 to 4b30914 Compare September 4, 2025 12:28

atobiszei requested review from dtrawins and rasapala September 4, 2025 13:06

atobiszei removed the WIP Do not merge until resolved label Sep 5, 2025

rasapala reviewed Sep 5, 2025

View reviewed changes

dtrawins reviewed Sep 5, 2025

View reviewed changes

rasapala approved these changes Sep 8, 2025

View reviewed changes

atobiszei requested a review from dtrawins September 11, 2025 11:00

dtrawins reviewed Sep 16, 2025

View reviewed changes

dtrawins requested changes Sep 17, 2025

View reviewed changes

atobiszei requested a review from dtrawins September 29, 2025 14:36

dtrawins reviewed Oct 1, 2025

View reviewed changes

atobiszei requested review from Copilot and dtrawins October 2, 2025 09:15

atobiszei added 5 commits October 2, 2025 17:24

GGUF pull docs

fbcc370

Parameters doc

7dff69a

Demo draft

8cbda37

Add links

1809782

Self-review fixes

262e510

atobiszei added 6 commits October 2, 2025 17:24

Review

e074ec5

Move note back

0883c63

Add supported models

489ff72

Review

548e7a1

Review

0ddc52b

Review

b0adc55

atobiszei force-pushed the atobisze_GGUF_docs branch from 7246ffe to b0adc55 Compare October 2, 2025 15:24

Copilot AI reviewed Oct 2, 2025

View reviewed changes

atobiszei added 3 commits October 3, 2025 10:56

Spelling fix

4819a53

Merge remote-tracking branch 'origin/main' into atobisze_GGUF_docs

be83178

Update intro

bf643b9

dtrawins approved these changes Oct 6, 2025

View reviewed changes

dtrawins merged commit 274de01 into main Oct 6, 2025
1 check passed


		> NOTE: This is experimental feature and issues in accuracy of models may be observed.

		> NOTE: Model downloading feature is described in depth in separate documentation page: [Pulling HuggingFaces Models](../../docs/pull_hf_models.md).


		This demo shows how to deploy model with the OpenVINO Model Server.

		Currently supported models are DeepSeek-R1-Distill-Qwen (1.5B, 7B), Qwen2.5 Instruct (1.5B, 3B, 7B) & llama-3.2 Instruct (1B, 3B, 8B).


		If the model already exists locally, it will skip the downloading and immediately start the serving.

		> NOTE: Optionally, to only download the model and omit the serving part, use `--pull` parameter.


		> NOTE: Optionally, to only download the model and omit the serving part, use `--pull` parameter.

		Start with deploying the model:

	-d '{"model": "LLM", \
	-d '{"model": "Qwen/Qwen2.5-3B-Instruct", \

	docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf
	docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model "unsloth/Llama-3.2-1B-Instruct-GGUF" --model_repository_path /models --model_name unsloth/Llama-3.2-1B-Instruct-GGUF --task text_generation --gguf_filename Llama-3.2-1B-Instruct-Q4_K_M.gguf

-docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:weekly --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] --task <task> [TASK_SPECIFIC_PARAMETERS]
+docker run $(id -u):$(id -g) --rm \
+    -v <model_repository_path>:/models:rw \
+    openvino/model_server:weekly \
+    --pull \
+    --source_model <model_name_in_HF> \
+    --model_repository_path /models \
+    --model_name <external_model_name> \
+    --target_device <DEVICE> \
+    [--gguf_filename SPECIFIC_QUANTIZATION_FILENAME.gguf] \
+    --task <task> \
+    [TASK_SPECIFIC_PARAMETERS]

GGUF pull docs #3616

GGUF pull docs #3616

Uh oh!

Conversation

atobiszei commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rasapala left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

atobiszei commented Sep 1, 2025 •

edited

Loading