Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update samples readme #1545

Merged
merged 5 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 45 additions & 28 deletions samples/cpp/text_generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

These samples showcase the use of OpenVINO's inference capabilities for text generation tasks, including different decoding strategies such as beam search, multinomial sampling, and speculative decoding. Each sample has a specific focus and demonstrates a unique aspect of text generation.
The applications don't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU.
There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descritions.
There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descriptions.

## Table of Contents
1. [Download and Convert the Model and Tokenizers](#download-and-convert-the-model-and-tokenizers)
Expand All @@ -11,25 +11,50 @@ There are also Jupyter notebooks for some samples. You can find links to them in
4. [Support and Contribution](#support-and-contribution)

## Download and convert the model and tokenizers

The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.

It's not required to install [../../export-requirements.txt](../../export-requirements.txt) for deployment if the model has already been exported.

Install [../../export-requirements.txt](../../export-requirements.txt) if model conversion is required.
```sh
pip install --upgrade-strategy eager -r ../../requirements.txt
pip install --upgrade-strategy eager -r ../../export-requirements.txt
optimim-cli export openvino --model <model> <output_folder>
```
If a HF model is already converted (as example [OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov)), it can be download directly via huggingface-cli
olpipi marked this conversation as resolved.
Show resolved Hide resolved
```sh
pip install --upgrade-strategy eager -r ../../export-requirements.txt
huggingface-cli download <model> --local-dir <output_folder>
olpipi marked this conversation as resolved.
Show resolved Hide resolved
```

## Sample Descriptions
### Common information
Follow [Get Started with Samples](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples.
Follow [build instruction](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/BUILD.md) to build GenAI samples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced instructions describe how to build the whole package. They should be extended to describe how to build samples only from OpenVINO archive

Copy link
Collaborator Author

@olpipi olpipi Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whole genai build includes samples. They cannot be built separately now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenVINO archive

OpenVINO archive is not enough to build GenAI samples.

Do you mean build samples when you download OpenVINO GenAI and re-use samples source from the samples folder?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to build samples from the OV archive only. https://medium.com/openvino-toolkit/how-to-build-openvino-genai-app-in-c-32dcbe42fa67
Samples (samples folder) are included in the archive as well.
I see that build script is still in place in the archive.
Build samples from archive is more convenient for developer as build whole openvino genai from source when only samples are needed, require a lot of time.
Therefore adding section to build instruction how to build samples from archive will be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openvino_genai_windows_2024.3.0.0_x86_64

is genai package

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okey, with OpenVINO Archive (as it is called in selector tool) I meant OpenVINO GenAI Archive.
image

Copy link
Collaborator Author

@olpipi olpipi Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CVS-160869 - task to update build manual


Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU.
GPUs usually provide better performance compared to CPUs. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a GPU. Modify the source code to change the device for inference to the GPU.
olpipi marked this conversation as resolved.
Show resolved Hide resolved

See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models.
olpipi marked this conversation as resolved.
Show resolved Hide resolved

### 1. Greedy Causal LM (`greedy_causal_lm`)
Install [../../deployment-requirements.txt](../../deployment-requirements.txt) to run samples
```sh
pip install --upgrade-strategy eager -r ../../deployment-requirements.txt
```

### 1. Chat Sample (`chat_sample`)
- **Description:**
Interactive chat interface powered by OpenVINO.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) that provides an example of LLM-powered text generation in Python.
Recommended models: meta-llama/Llama-2-7b-chat-hf, TinyLlama/TinyLlama-1.1B-Chat-v1.0, etc
- **Main Feature:** Real-time chat-like text generation.
- **Run Command:**
```bash
./chat_sample <MODEL_DIR>
```
#### Missing chat template
If you encounter an exception indicating a missing "chat template" when launching the `ov::genai::LLMPipeline` in chat mode, it likely means the model was not tuned for chat functionality. To work this around, manually add the chat template to tokenizer_config.json of your model.
The following template can be used as a default, but it may not work properly with every model:
```
"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}",
```

### 2. Greedy Causal LM (`greedy_causal_lm`)
- **Description:**
Basic text generation using a causal language model.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering) that provides an example of LLM-powered text generation in Python.
Expand All @@ -40,7 +65,7 @@ Recommended models: meta-llama/Llama-2-7b-hf, etc
./greedy_causal_lm <MODEL_DIR> "<PROMPT>"
```

### 2. Beam Search Causal LM (`beam_search_causal_lm`)
### 3. Beam Search Causal LM (`beam_search_causal_lm`)
- **Description:**
Uses beam search for more coherent text generation.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering) that provides an example of LLM-powered text generation in Python.
Expand All @@ -51,23 +76,6 @@ Recommended models: meta-llama/Llama-2-7b-hf, etc
./beam_search_causal_lm <MODEL_DIR> "<PROMPT 1>" ["<PROMPT 2>" ...]
```

### 3. Chat Sample (`chat_sample`)
- **Description:**
Interactive chat interface powered by OpenVINO.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) that provides an example of LLM-powered text generation in Python.
Recommended models: meta-llama/Llama-2-7b-chat-hf, TinyLlama/TinyLlama-1.1B-Chat-v1.0, etc
- **Main Feature:** Real-time chat-like text generation.
- **Run Command:**
```bash
./chat_sample <MODEL_DIR>
```
#### Missing chat template
If you encounter an exception indicating a missing "chat template" when launching the `ov::genai::LLMPipeline` in chat mode, it likely means the model was not tuned for chat functionality. To work this around, manually add the chat template to tokenizer_config.json of your model.
The following template can be used as a default, but it may not work properly with every model:
```
"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}",
```

### 4. Multinomial Causal LM (`multinomial_causal_lm`)
- **Description:** Text generation with multinomial sampling for diversity.
Recommended models: meta-llama/Llama-2-7b-hf, etc
Expand Down Expand Up @@ -104,7 +112,16 @@ Recommended models: meta-llama/Llama-2-13b-hf as main model and TinyLlama/TinyLl
./speculative_decoding_lm <MODEL_DIR> <DRAFT_MODEL_DIR> "<PROMPT>"
```

### 7. Encrypted Model Causal LM (`encrypted_model_causal_lm`)
### 7. LoRA Greedy Causal LM (`lora_greedy_causal_lm`)
- **Description:**
This sample demonstrates greedy decoding using Low-Rank Adaptation (LoRA) fine-tuned causal language models. LoRA enables efficient fine-tuning, reducing resource requirements for adapting large models to specific tasks.
ilya-lavrenov marked this conversation as resolved.
Show resolved Hide resolved
- **Main Feature:** Lightweight fine-tuning with LoRA for efficient text generation
- **Run Command:**
```bash
./lora_greedy_causal_lm <MODEL_DIR> <ADAPTER_SAFETENSORS_FILE> "<PROMPT>"
```

### 8. Encrypted Model Causal LM (`encrypted_model_causal_lm`)
- **Description:**
LLMPipeline and Tokenizer objects can be initialized directly from the memory buffer, e.g. when user stores only encrypted files and decrypts them on-the-fly.
The following code snippet demonstrates how to load the model from the memory buffer:
Expand All @@ -120,7 +137,7 @@ For the sake of brevity the code above does not include Tokenizer decryption. Fo
./encrypted_model_causal_lm <MODEL_DIR> "<PROMPT>"
```

### 8. LLMs benchmarking sample (`benchmark_genai`)
### 9. LLMs benchmarking sample (`benchmark_genai`)
- **Description:**
This sample script demonstrates how to benchmark an LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics.

Expand Down
71 changes: 44 additions & 27 deletions samples/python/text_generation/README.md
ilya-lavrenov marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

These samples showcase the use of OpenVINO's inference capabilities for text generation tasks, including different decoding strategies such as beam search, multinomial sampling, and speculative decoding. Each sample has a specific focus and demonstrates a unique aspect of text generation.
The applications don't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU.
There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descritions.
There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descriptions.

## Table of Contents
1. [Download and Convert the Model and Tokenizers](#download-and-convert-the-model-and-tokenizers)
Expand All @@ -11,25 +11,50 @@ There are also Jupyter notebooks for some samples. You can find links to them in
4. [Support and Contribution](#support-and-contribution)

## Download and convert the model and tokenizers

The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.

It's not required to install [../../export-requirements.txt](../../export-requirements.txt) for deployment if the model has already been exported.

Install [../../export-requirements.txt](../../export-requirements.txt) if model conversion is required.
```sh
pip install --upgrade-strategy eager -r ../../requirements.txt
pip install --upgrade-strategy eager -r ../../export-requirements.txt
optimim-cli export openvino --model <model> <output_folder>
```
If a HF model is already converted (as example [OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov)), it can be download directly via huggingface-cli
olpipi marked this conversation as resolved.
Show resolved Hide resolved
```sh
pip install --upgrade-strategy eager -r ../../export-requirements.txt
huggingface-cli download <model> --local-dir <output_folder>
```

## Sample Descriptions
### Common information
Follow [Get Started with Samples](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples.
Follow [build instruction](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/BUILD.md) to build GenAI samples

Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU.
GPUs usually provide better performance compared to CPUs. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a GPU. Modify the source code to change the device for inference to the GPU.
olpipi marked this conversation as resolved.
Show resolved Hide resolved

See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models.
olpipi marked this conversation as resolved.
Show resolved Hide resolved

### 1. Greedy Causal LM (`greedy_causal_lm`)
Install [../../deployment-requirements.txt](../../deployment-requirements.txt) to run samples
```sh
pip install --upgrade-strategy eager -r ../../deployment-requirements.txt
```

### 1. Chat Sample (`chat_sample`)
- **Description:**
Interactive chat interface powered by OpenVINO.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) that provides an example of LLM-powered text generation in Python.
Recommended models: meta-llama/Llama-2-7b-chat-hf, TinyLlama/TinyLlama-1.1B-Chat-v1.0, etc
- **Main Feature:** Real-time chat-like text generation.
- **Run Command:**
```bash
python chat_sample.py model_dir
```
#### Missing chat template
If you encounter an exception indicating a missing "chat template" when launching the `ov::genai::LLMPipeline` in chat mode, it likely means the model was not tuned for chat functionality. To work this around, manually add the chat template to tokenizer_config.json of your model.
The following template can be used as a default, but it may not work properly with every model:
```
"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}",
```

### 2. Greedy Causal LM (`greedy_causal_lm`)
- **Description:**
Basic text generation using a causal language model.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering) that provides an example of LLM-powered text generation in Python.
Expand All @@ -40,7 +65,7 @@ Recommended models: meta-llama/Llama-2-7b-hf, etc
python greedy_causal_lm.py [-h] model_dir prompt
```

### 2. Beam Search Causal LM (`beam_search_causal_lm`)
### 3. Beam Search Causal LM (`beam_search_causal_lm`)
- **Description:**
Uses beam search for more coherent text generation.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering) that provides an example of LLM-powered text generation in Python.
Expand All @@ -51,23 +76,6 @@ Recommended models: meta-llama/Llama-2-7b-hf, etc
python beam_search_causal_lm.py model_dir prompt [prompts ...]
```

### 3. Chat Sample (`chat_sample`)
- **Description:**
Interactive chat interface powered by OpenVINO.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) that provides an example of LLM-powered text generation in Python.
Recommended models: meta-llama/Llama-2-7b-chat-hf, TinyLlama/TinyLlama-1.1B-Chat-v1.0, etc
- **Main Feature:** Real-time chat-like text generation.
- **Run Command:**
```bash
python chat_sample.py model_dir
```
#### Missing chat template
If you encounter an exception indicating a missing "chat template" when launching the `ov::genai::LLMPipeline` in chat mode, it likely means the model was not tuned for chat functionality. To work this around, manually add the chat template to tokenizer_config.json of your model.
The following template can be used as a default, but it may not work properly with every model:
```
"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}",
```

### 4. Multinomial Causal LM (`multinomial_causal_lm`)
- **Description:** Text generation with multinomial sampling for diversity.
Recommended models: meta-llama/Llama-2-7b-hf, etc
Expand Down Expand Up @@ -104,7 +112,16 @@ Recommended models: meta-llama/Llama-2-13b-hf as main model and TinyLlama/TinyLl
python speculative_decoding_lm.py model_dir draft_model_dir prompt
```

### 7. LLMs benchmarking sample (`benchmark_genai`)
### 7. LoRA Greedy Causal LM (`lora_greedy_causal_lm`)
- **Description:**
This sample demonstrates greedy decoding using Low-Rank Adaptation (LoRA) fine-tuned causal language models. LoRA enables efficient fine-tuning, reducing resource requirements for adapting large models to specific tasks.
- **Main Feature:** Lightweight fine-tuning with LoRA for efficient text generation
- **Run Command:**
```bash
./lora_greedy_causal_lm <MODEL_DIR> <ADAPTER_SAFETENSORS_FILE> "<PROMPT>"
```

### 8. LLMs benchmarking sample (`benchmark_genai`)
- **Description:**
This sample script demonstrates how to benchmark an LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics.

Expand Down
Loading