merge main in branch

huggingface · Jul 11, 2024 · 01362bc · 01362bc
2 parents 69ea8a1 + b25e845
commit 01362bc
Show file tree

Hide file tree

Showing 31 changed files with 771 additions and 494 deletions.
diff --git a/.github/workflows/test_ipex.yml b/.github/workflows/test_ipex.yml
@@ -21,7 +21,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: [3.8, 3.9]
-        transformers-version: [4.39.0, 4.41.2]
+        transformers-version: [4.39.0, 4.42.3]
         os: [ubuntu-latest]
 
     runs-on: ${{ matrix.os }}

diff --git a/.github/workflows/test_openvino.yml b/.github/workflows/test_openvino.yml
@@ -21,7 +21,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: ["3.8", "3.12"]
-        transformers-version: ["4.36.0", "4.41.*"]
+        transformers-version: ["4.36.0", "4.42.*"]
         os: [ubuntu-latest]
 
     runs-on: ${{ matrix.os }}

diff --git a/docs/Dockerfile b/docs/Dockerfile
@@ -25,4 +25,4 @@ RUN npm install npm@9.8.1 -g && \
 RUN python3 -m pip install --no-cache-dir --upgrade pip
 RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/doc-builder.git
 RUN git clone $clone_url && cd optimum-intel && git checkout $commit_sha
-RUN python3 -m pip install --no-cache-dir ./optimum-intel[neural-compressor,openvino,nncf,quality]
+RUN python3 -m pip install --no-cache-dir ./optimum-intel[neural-compressor,openvino,diffusers,quality]
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -22,6 +22,13 @@
       title: Supported Models
     - local: openvino/reference
       title: Reference
+    - sections:
+      - local: openvino/tutorials/notebooks
+        title: Notebooks
+      - local: openvino/tutorials/diffusers
+        title: Generate images with Diffusion models
+      title: Tutorials
+      isExpanded: false
     title: OpenVINO
   title: Optimum Intel
   isExpanded: false
diff --git a/docs/source/openvino/export.mdx b/docs/source/openvino/export.mdx
@@ -14,25 +14,15 @@ specific language governing permissions and limitations under the License.
 To export your model to the [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) format with the CLI :
 
 ```bash
-optimum-cli export openvino --model gpt2 ov_model/
+optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B ov_model/
 ```
 
 The model argument can either be the model ID of a model hosted on the [Hub](https://huggingface.co/models) or a path to a model hosted locally. For local models, you need to specify the task for which the model should be loaded before export, among the list of the [supported tasks](https://huggingface.co/docs/optimum/main/en/exporters/task_manager).
 
-
 ```bash
-optimum-cli export openvino --model local_model_dir --task text-generation-with-past ov_model/
+optimum-cli export openvino --model local_llama --task text-generation-with-past ov_model/
 ```
 
-The `-with-past` suffix enable the re-use of past keys and values. This allows to avoid recomputing the same intermediate activations during the generation. to export the model without, you will need to remove this suffix.
-
-| With K-V cache                           | Without K-V cache                    |
-|------------------------------------------|--------------------------------------|
-| `text-generation-with-past`              | `text-generation`                    |
-| `text2text-generation-with-past`         | `text2text-generation`               |
-| `automatic-speech-recognition-with-past` | `automatic-speech-recognition`       |
-
-
 Check out the help for more options:
 
 ```bash
@@ -70,7 +60,7 @@ Optional arguments:
   --pad-token-id PAD_TOKEN_ID
                         This is needed by some models, for some tasks. If not provided, will attempt to use the tokenizer to guess it.
   --ratio RATIO         A parameter used when applying 4-bit quantization to control the ratio between 4-bit and 8-bit quantization. If set to 0.8, 80% of the layers will be quantized to int4 while
-                        20% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 0.8.
+                        20% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 1.0.
   --sym                 Whether to apply symmetric quantization
   --group-size GROUP_SIZE
                         The group size to use for int4 quantization. Recommended value is 128 and -1 will results in per-column quantization.
@@ -97,7 +87,7 @@ Optional arguments:
 You can also apply fp16, 8-bit or 4-bit weight-only quantization on the Linear, Convolutional and Embedding layers when exporting your model by setting `--weight-format` to respectively `fp16`, `int8` or `int4`:
 
 ```bash
-optimum-cli export openvino --model gpt2 --weight-format int8 ov_model/
+optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B --weight-format int8 ov_model/
 ```
 
 For more information on the quantization parameters checkout the [documentation](inference#weight-only-quantization)
@@ -109,6 +99,33 @@ Models larger than 1 billion parameters are exported to the OpenVINO format with
 
 </Tip>
 
+
+### Decoder models
+
+For models with a decoder, we enable the re-use of past keys and values by default. This allows to avoid recomputing the same intermediate activations at each generation step. To export the model without, you will need to remove the `-with-past` suffix when specifying the task.
+
+| With K-V cache                           | Without K-V cache                    |
+|------------------------------------------|--------------------------------------|
+| `text-generation-with-past`              | `text-generation`                    |
+| `text2text-generation-with-past`         | `text2text-generation`               |
+| `automatic-speech-recognition-with-past` | `automatic-speech-recognition`       |
+
+
+### Diffusion models
+
+When Stable Diffusion models are exported to the OpenVINO format, they are decomposed into different components that are later combined during inference:
+
+* Text encoder(s)
+* U-Net
+* VAE encoder
+* VAE decoder
+
+To export your Stable Diffusion XL model to the OpenVINO IR format with the CLI you can do as follows:
+
+```bash
+optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0 ov_sdxl/
+```
+
 ## When loading your model
 
 You can also load your PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, by setting `export=True` when loading your model.
@@ -121,7 +138,7 @@ To easily save the resulting model, you can use the `save_pretrained()` method,
 + from optimum.intel import OVModelForCausalLM
   from transformers import AutoTokenizer
 
-  model_id = "gpt2"
+  model_id = "meta-llama/Meta-Llama-3-8B"
 - model = AutoModelForCausalLM.from_pretrained(model_id)
 + model = OVModelForCausalLM.from_pretrained(model_id, export=True)
   tokenizer = AutoTokenizer.from_pretrained(model_id)
@@ -137,7 +154,7 @@ To easily save the resulting model, you can use the `save_pretrained()` method,
 from transformers import AutoModelForCausalLM
 from optimum.exporters.openvino import export_from_model
 
-model = AutoModelForCausalLM.from_pretrained("gpt2")
+model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")
 export_from_model(model, output="ov_model", task="text-generation-with-past")
 ```