Skip to content

Commit

Permalink
merge main in branch
Browse files Browse the repository at this point in the history
  • Loading branch information
echarlaix committed Jul 11, 2024
2 parents 69ea8a1 + b25e845 commit 01362bc
Show file tree
Hide file tree
Showing 31 changed files with 771 additions and 494 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test_ipex.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
fail-fast: false
matrix:
python-version: [3.8, 3.9]
transformers-version: [4.39.0, 4.41.2]
transformers-version: [4.39.0, 4.42.3]
os: [ubuntu-latest]

runs-on: ${{ matrix.os }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test_openvino.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
fail-fast: false
matrix:
python-version: ["3.8", "3.12"]
transformers-version: ["4.36.0", "4.41.*"]
transformers-version: ["4.36.0", "4.42.*"]
os: [ubuntu-latest]

runs-on: ${{ matrix.os }}
Expand Down
2 changes: 1 addition & 1 deletion docs/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ RUN npm install npm@9.8.1 -g && \
RUN python3 -m pip install --no-cache-dir --upgrade pip
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/doc-builder.git
RUN git clone $clone_url && cd optimum-intel && git checkout $commit_sha
RUN python3 -m pip install --no-cache-dir ./optimum-intel[neural-compressor,openvino,nncf,quality]
RUN python3 -m pip install --no-cache-dir ./optimum-intel[neural-compressor,openvino,diffusers,quality]
7 changes: 7 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@
title: Supported Models
- local: openvino/reference
title: Reference
- sections:
- local: openvino/tutorials/notebooks
title: Notebooks
- local: openvino/tutorials/diffusers
title: Generate images with Diffusion models
title: Tutorials
isExpanded: false
title: OpenVINO
title: Optimum Intel
isExpanded: false
49 changes: 33 additions & 16 deletions docs/source/openvino/export.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,25 +14,15 @@ specific language governing permissions and limitations under the License.
To export your model to the [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) format with the CLI :

```bash
optimum-cli export openvino --model gpt2 ov_model/
optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B ov_model/
```

The model argument can either be the model ID of a model hosted on the [Hub](https://huggingface.co/models) or a path to a model hosted locally. For local models, you need to specify the task for which the model should be loaded before export, among the list of the [supported tasks](https://huggingface.co/docs/optimum/main/en/exporters/task_manager).


```bash
optimum-cli export openvino --model local_model_dir --task text-generation-with-past ov_model/
optimum-cli export openvino --model local_llama --task text-generation-with-past ov_model/
```

The `-with-past` suffix enable the re-use of past keys and values. This allows to avoid recomputing the same intermediate activations during the generation. to export the model without, you will need to remove this suffix.

| With K-V cache | Without K-V cache |
|------------------------------------------|--------------------------------------|
| `text-generation-with-past` | `text-generation` |
| `text2text-generation-with-past` | `text2text-generation` |
| `automatic-speech-recognition-with-past` | `automatic-speech-recognition` |


Check out the help for more options:

```bash
Expand Down Expand Up @@ -70,7 +60,7 @@ Optional arguments:
--pad-token-id PAD_TOKEN_ID
This is needed by some models, for some tasks. If not provided, will attempt to use the tokenizer to guess it.
--ratio RATIO A parameter used when applying 4-bit quantization to control the ratio between 4-bit and 8-bit quantization. If set to 0.8, 80% of the layers will be quantized to int4 while
20% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 0.8.
20% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 1.0.
--sym Whether to apply symmetric quantization
--group-size GROUP_SIZE
The group size to use for int4 quantization. Recommended value is 128 and -1 will results in per-column quantization.
Expand All @@ -97,7 +87,7 @@ Optional arguments:
You can also apply fp16, 8-bit or 4-bit weight-only quantization on the Linear, Convolutional and Embedding layers when exporting your model by setting `--weight-format` to respectively `fp16`, `int8` or `int4`:

```bash
optimum-cli export openvino --model gpt2 --weight-format int8 ov_model/
optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B --weight-format int8 ov_model/
```

For more information on the quantization parameters checkout the [documentation](inference#weight-only-quantization)
Expand All @@ -109,6 +99,33 @@ Models larger than 1 billion parameters are exported to the OpenVINO format with

</Tip>


### Decoder models

For models with a decoder, we enable the re-use of past keys and values by default. This allows to avoid recomputing the same intermediate activations at each generation step. To export the model without, you will need to remove the `-with-past` suffix when specifying the task.

| With K-V cache | Without K-V cache |
|------------------------------------------|--------------------------------------|
| `text-generation-with-past` | `text-generation` |
| `text2text-generation-with-past` | `text2text-generation` |
| `automatic-speech-recognition-with-past` | `automatic-speech-recognition` |


### Diffusion models

When Stable Diffusion models are exported to the OpenVINO format, they are decomposed into different components that are later combined during inference:

* Text encoder(s)
* U-Net
* VAE encoder
* VAE decoder

To export your Stable Diffusion XL model to the OpenVINO IR format with the CLI you can do as follows:

```bash
optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0 ov_sdxl/
```

## When loading your model

You can also load your PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, by setting `export=True` when loading your model.
Expand All @@ -121,7 +138,7 @@ To easily save the resulting model, you can use the `save_pretrained()` method,
+ from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer

model_id = "gpt2"
model_id = "meta-llama/Meta-Llama-3-8B"
- model = AutoModelForCausalLM.from_pretrained(model_id)
+ model = OVModelForCausalLM.from_pretrained(model_id, export=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
Expand All @@ -137,7 +154,7 @@ To easily save the resulting model, you can use the `save_pretrained()` method,
from transformers import AutoModelForCausalLM
from optimum.exporters.openvino import export_from_model

model = AutoModelForCausalLM.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")
export_from_model(model, output="ov_model", task="text-generation-with-past")
```

Expand Down
Loading

0 comments on commit 01362bc

Please sign in to comment.