Skip to content

Commit

Permalink
merge main branch
Browse files Browse the repository at this point in the history
Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
  • Loading branch information
PenghuiCheng committed Dec 19, 2023
2 parents 9d03415 + 173aacd commit 4ca23df
Show file tree
Hide file tree
Showing 41 changed files with 2,383 additions and 730 deletions.
18 changes: 0 additions & 18 deletions .github/workflows/delete_doc_comment.yml

This file was deleted.

12 changes: 0 additions & 12 deletions .github/workflows/delete_doc_comment_trigger.yml

This file was deleted.

3 changes: 2 additions & 1 deletion .github/workflows/test_inc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[neural-compressor,ipex,diffusers,tests]
pip install .[neural-compressor,diffusers,tests]
pip install intel-extension-for-pytorch
- name: Test with Pytest
run: |
pytest tests/neural_compressor/
6 changes: 6 additions & 0 deletions .github/workflows/test_openvino.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,9 @@ jobs:
- name: Test with Pytest
run: |
pytest tests/openvino/ --ignore test_modeling_basic
- name: Test openvino-nightly import
run: |
pip uninstall -y openvino
pip install openvino-nightly
python -c "from optimum.intel import OVModelForCausalLM; OVModelForCausalLM.from_pretrained('hf-internal-testing/tiny-random-gpt2', export=True, compile=False)"
44 changes: 35 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,26 +67,52 @@ For more details on the supported compression techniques, please refer to the [d

Below are the examples of how to use OpenVINO and its [NNCF](https://docs.openvino.ai/latest/tmo_introduction.html) framework to accelerate inference.

#### Export:

It is possible to export your model to the [OpenVINO](https://docs.openvino.ai/2023.1/openvino_ir.html) IR format with the CLI :

```plain
optimum-cli export openvino --model gpt2 ov_model
```

If you add `--int8`, the model linear and embedding weights will be quantized to INT8, the activations will be kept in floating point precision.

```plain
optimum-cli export openvino --model gpt2 --int8 ov_model
```

To apply quantization on both weights and activations, you can find more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov).

#### Inference:

To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.
If you want to load a PyTorch checkpoint, set `export=True` to convert your model to the OpenVINO IR.


```diff
- from transformers import AutoModelForSequenceClassification
+ from optimum.intel import OVModelForSequenceClassification
- from transformers import AutoModelForSeq2SeqLM
+ from optimum.intel import OVModelForSeq2SeqLM
from transformers import AutoTokenizer, pipeline

model_id = "distilbert-base-uncased-finetuned-sst-2-english"
- model = AutoModelForSequenceClassification.from_pretrained(model_id)
+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)
model_id = "echarlaix/t5-small-openvino"
- model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
+ model = OVModelForSeq2SeqLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model.save_pretrained("./distilbert")
pipe = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
results = pipe("He never went out without a book under his arm, and he often came back with two.")

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
results = classifier("He's a dreadful magician.")
[{'translation_text': "Il n'est jamais sorti sans un livre sous son bras, et il est souvent revenu avec deux."}]
```

If you want to load a PyTorch checkpoint, set `export=True` to convert your model to the OpenVINO IR.

```python
from optimum.intel import OVModelForCausalLM

model = OVModelForCausalLM.from_pretrained("gpt2", export=True)
model.save_pretrained("./ov_model")
```


#### Post-training static quantization:

Post-training static quantization introduces an additional calibration step where data is fed through the network in order to compute the activations quantization parameters. Here is an example on how to apply static quantization on a fine-tuned DistilBERT.
Expand Down
Loading

0 comments on commit 4ca23df

Please sign in to comment.