Skip to content

Commit

Permalink
Apply comments
Browse files Browse the repository at this point in the history
  • Loading branch information
olpipi committed Jan 3, 2025
1 parent b1fe041 commit a48de38
Show file tree
Hide file tree
Showing 7 changed files with 75 additions and 108 deletions.
2 changes: 0 additions & 2 deletions samples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
# SPDX-License-Identifier: Apache-2.0
#

add_subdirectory(cpp/benchmark_genai)
add_subdirectory(cpp/text_generation)
add_subdirectory(cpp/image_generation)
add_subdirectory(cpp/visual_language_chat)
Expand All @@ -16,7 +15,6 @@ install(FILES
COMPONENT cpp_samples_genai)

install(DIRECTORY
cpp/benchmark_genai
cpp/text_generation
cpp/image_generation
cpp/visual_language_chat
Expand Down
32 changes: 0 additions & 32 deletions samples/cpp/benchmark_genai/CMakeLists.txt

This file was deleted.

49 changes: 0 additions & 49 deletions samples/cpp/benchmark_genai/README.md

This file was deleted.

25 changes: 25 additions & 0 deletions samples/cpp/text_generation/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,28 @@ set (SAMPLE_LIST
foreach(sample ${SAMPLE_LIST})
add_sample_executable(${sample})
endforeach()


# benchmark_genai
include(FetchContent)

if(POLICY CMP0135)
cmake_policy(SET CMP0135 NEW)
endif()

FetchContent_Declare(cxxopts
URL https://github.com/jarro2783/cxxopts/archive/refs/tags/v3.1.1.tar.gz
URL_HASH SHA256=523175f792eb0ff04f9e653c90746c12655f10cb70f1d5e6d6d9491420298a08)
FetchContent_MakeAvailable(cxxopts)

add_executable(benchmark_genai benchmark_genai.cpp)
target_link_libraries(benchmark_genai PRIVATE openvino::genai cxxopts::cxxopts)
set_target_properties(benchmark_genai PROPERTIES
COMPILE_PDB_NAME benchmark_genai
# Ensure out of box LC_RPATH on macOS with SIP
INSTALL_RPATH_USE_LINK_PATH ON)

install(TARGETS benchmark_genai
RUNTIME DESTINATION samples_bin/
COMPONENT samples_bin
EXCLUDE_FROM_ALL)
73 changes: 49 additions & 24 deletions samples/cpp/text_generation/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@

# OpenVINO AI Text Generation Samples
# OpenVINO GenAI Text Generation Samples

These samples showcase the use of OpenVINO's inference capabilities for text generation tasks, including different decoding strategies such as beam search, multinomial sampling, and speculative decoding. Each sample has a specific focus and demonstrates a unique aspect of text generation.
The applications don't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU.
There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) that provides an example of LLM-powered text generation in Python.
There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descritions.

## Table of Contents
1. [Download and Convert the Model and Tokenizers](#download-and-convert-the-model-and-tokenizers)
2. [Running the Samples](#running-the-samples)
3. [Using encrypted models](#using-encrypted-models)
4. [Sample Descriptions](#sample-descriptions)
5. [Troubleshooting](#troubleshooting)
6. [Support and Contribution](#support-and-contribution)
2. [Sample Descriptions](#sample-descriptions)
3. [Troubleshooting](#troubleshooting)
4. [Support and Contribution](#support-and-contribution)

## Download and convert the model and tokenizers

Expand All @@ -24,41 +22,47 @@ pip install --upgrade-strategy eager -r ../../requirements.txt
optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 TinyLlama-1.1B-Chat-v1.0
```

## Running the Samples
Model examples to use for different samples:
chat_sample - meta-llama/Llama-2-7b-chat-hf
speculative_decoding_lm - meta-llama/Llama-2-13b-hf as main model and TinyLlama/TinyLlama-1.1B-Chat-v1.0 as draft model
other samples - meta-llama/Llama-2-7b-hf

Follow [Get Started with Samples](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/get-started-demos.html) to run a specific sample.

`greedy_causal_lm TinyLlama-1.1B-Chat-v1.0 "Why is the Sun yellow?"`
## Sample Descriptions
### Common information
Follow [Get Started with Samples](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples.

Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU.

See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models.


## Sample Descriptions

### 1. Greedy Causal LM (`greedy_causal_lm`)
- **Description:** Basic text generation using a causal language model.
- **Description:**
Basic text generation using a causal language model.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering) that provides an example of LLM-powered text generation in Python.
- **Main Feature:** Demonstrates simple text continuation.
- **Run Command:**
```bash
./greedy_causal_lm <model_path> <prompt>
./greedy_causal_lm <MODEL_DIR> "<PROMPT>"
```

### 2. Beam Search Causal LM (`beam_search_causal_lm`)
- **Description:** Uses beam search for more coherent text generation.
- **Description:**
Uses beam search for more coherent text generation.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering) that provides an example of LLM-powered text generation in Python.
- **Main Feature:** Improves text quality with beam search.
- **Run Command:**
```bash
./beam_search_causal_lm <model_path> <prompt>
./beam_search_causal_lm <MODEL_DIR> "<PROMPT 1>" ["<PROMPT 2>" ...]
```

### 3. Chat Sample (`chat_sample`)
- **Description:** Interactive chat interface powered by OpenVINO.
- **Description:**
Interactive chat interface powered by OpenVINO.
Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) that provides an example of LLM-powered text generation in Python.
- **Main Feature:** Real-time chat-like text generation.
- **Run Command:**
```bash
./chat_sample <model_path>
./chat_sample <MODEL_DIR>
```
#### Missing chat template
If you encounter an exception indicating a missing "chat template" when launching the `ov::genai::LLMPipeline` in chat mode, it likely means the model was not tuned for chat functionality. To work this around, manually add the chat template to tokenizer_config.json of your model.
Expand All @@ -73,7 +77,7 @@ The following template can be used as a default, but it may not work properly wi
- **Main Feature:** Introduces randomness for creative outputs.
- **Run Command:**
```bash
./multinomial_causal_lm <model_path> <prompt>
./multinomial_causal_lm <MODEL_DIR> "<PROMPT>"
```

### 5. Prompt Lookup Decoding LM (`prompt_lookup_decoding_lm`)
Expand All @@ -82,7 +86,7 @@ The following template can be used as a default, but it may not work properly wi
- **Main Feature:** Specialized prompt-based inference.
- **Run Command:**
```bash
./prompt_lookup_decoding_lm <model_path> <prompt>
./prompt_lookup_decoding_lm <MODEL_DIR> "<PROMPT>"
```

### 6. Speculative Decoding LM (`speculative_decoding_lm`)
Expand All @@ -92,10 +96,12 @@ Speculative decoding (or [assisted-generation](https://huggingface.co/blog/assis
Speculative decoding works the following way. The draft model predicts the next K tokens one by one in an autoregressive manner, while the main model validates these predictions and corrects them if necessary. We go through each predicted token, and if a difference is detected between the draft and main model, we stop and keep the last token predicted by the main model. Then the draft model gets the latest main prediction and again tries to predict the next K tokens, repeating the cycle.

This approach reduces the need for multiple infer requests to the main model, enhancing performance. For instance, in more predictable parts of text generation, the draft model can, in best-case scenarios, generate the next K tokens that exactly match the target. In that case they are validated in a single inference request to the main model (which is bigger, more accurate but slower) instead of running K subsequent requests. More details can be found in the original paper https://arxiv.org/pdf/2211.17192.pdf, https://arxiv.org/pdf/2302.01318.pdf

Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/speculative-sampling) that provides an example of LLM-powered text generation in Python.
- **Main Feature:** Reduces latency while generating high-quality text.
- **Run Command:**
```bash
./speculative_decoding_lm <main_model_path> <draft_model_path> <prompt>
./speculative_decoding_lm <MODEL_DIR> <DRAFT_MODEL_DIR> "<PROMPT>"
```

### 7. Encrypted Model Causal LM (`encrypted_model_causal_lm`)
Expand All @@ -111,9 +117,28 @@ For the sake of brevity the code above does not include Tokenizer decryption. Fo
- **Main Feature:** Read model directly from memory buffer
- **Run Command:**
```bash
./encrypted_model_causal_lm <model_path> <prompt>
./encrypted_model_causal_lm <MODEL_DIR> "<PROMPT>"
```

### 8. LLMs benchmarking sample (`benchmark_genai`)
- **Description:**
This sample script demonstrates how to benchmark an LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics.

For more information how performance metrics are calculated please follow [performance-metrics tutorial](../../../src/README.md#performance-metrics).
- **Main Feature:** Benchmark model via GenAI
- **Run Command:**
```bash
./benchmark_genai [OPTIONS]
```
#### Options
- `-m, --model`: Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `"The Sky is blue because"`): The prompt to generate text.
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `-mt, --max_new_tokens` (default: `20`): Number of warmup iterations.
- `-n, --num_iter` (default: `3`): Number of iterations.
- `-d, --device` (default: `"CPU"`): Device to run the model on.


## Troubleshooting

### Unicode characters encoding error on Windows
Expand Down
2 changes: 1 addition & 1 deletion src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -394,7 +394,7 @@ durations = np.array(raw_metrics.m_new_token_times[1:]) - np.array(raw_metrics.m
print(f'Median from token to token duration: {np.median(durations):.2f} ms')
```

For more examples of how metrics are used, please refer to the Python [benchmark_genai.py](../samples/python/benchmark_genai/README.md) and C++ [benchmark_genai](../samples/cpp/benchmark_genai/README.md) samples.
For more examples of how metrics are used, please refer to the Python [benchmark_genai.py](../samples/python/benchmark_genai/README.md) and C++ [benchmark_genai](../samples/cpp/text_generation/README.md) samples.

## How It Works

Expand Down

0 comments on commit a48de38

Please sign in to comment.