diff --git a/README.md b/README.md index 0226b5d470..cc62fd47e9 100644 --- a/README.md +++ b/README.md @@ -210,6 +210,11 @@ You can find more examples in the [documentation](https://huggingface.co/docs/op ## IPEX +IPEX export can be used through the Optimum command-line interface: +```bash +optimum-cli export ipex -m gpt2 --torch_dtype bfloat16 ipex-gpt2 +``` + To load your IPEX model, you can just replace your `AutoModelForXxx` class with the corresponding `IPEXModelForXxx` class. You can set `export=True` to load a PyTorch checkpoint, export your model via TorchScript and apply IPEX optimizations : both operators optimization (replaced with customized IPEX operators) and graph-level optimization (like operators fusion) will be applied on your model. ```diff from transformers import AutoTokenizer, pipeline @@ -224,6 +229,10 @@ To load your IPEX model, you can just replace your `AutoModelForXxx` class with pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) results = pipe("He's a dreadful magician and") ++ # You can also use the model exported by Optimum command-line interface ++ exported_model = IPEXModelForCausalLM.from_pretrained("ipex-gpt2") ++ pipe = pipeline("text-generation", model=exported_model, tokenizer=tokenizer) ++ results = pipe("He's a dreadful magician and") ``` For more details, please refer to the [documentation](https://intel.github.io/intel-extension-for-pytorch/#introduction). @@ -231,7 +240,7 @@ For more details, please refer to the [documentation](https://intel.github.io/in ## Running the examples -Check out the [`examples`](https://github.com/huggingface/optimum-intel/tree/main/examples) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference. +Check out the [`examples`](https://github.com/huggingface/optimum-intel/tree/main/examples) and [`notebooks`](https://github.com/huggingface/optimum-intel/tree/main/notebooks) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference. Do not forget to install requirements for every example: diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 7053a17ef2..aa5f184ba7 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -30,5 +30,20 @@ title: Tutorials isExpanded: false title: OpenVINO + - sections: + - local: ipex/export + title: Export + - local: ipex/inference + title: Inference + - local: ipex/models + title: Supported Models + - local: ipex/reference + title: Reference + - sections: + - local: ipex/tutorials/notebooks + title: Notebooks + title: Tutorials + isExpanded: false + title: IPEX title: Optimum Intel isExpanded: false diff --git a/docs/source/index.mdx b/docs/source/index.mdx index 75e99d8688..15bc89d82e 100644 --- a/docs/source/index.mdx +++ b/docs/source/index.mdx @@ -19,6 +19,8 @@ limitations under the License. 🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures. +[Intel Extension for PyTorch](https://intel.github.io/intel-extension-for-pytorch/#introduction) is an open-source library which provides optimizations for both eager mode and graph mode, however, compared to eager mode, graph mode in PyTorch* normally yields better performance from optimization techniques, such as operation fusion. + [Intel Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) is an open-source library enabling the usage of the most popular compression techniques such as quantization, pruning and knowledge distillation. It supports automatic accuracy-driven tuning strategies in order for users to easily generate quantized model. The users can easily apply static, dynamic and aware-training quantization approaches while giving an expected accuracy criteria. It also supports different weight pruning techniques enabling the creation of pruned model giving a predefined sparsity target. [OpenVINO](https://docs.openvino.ai) is an open-source toolkit that enables high performance inference capabilities for Intel CPUs, GPUs, and special DL inference accelerators ([see](https://docs.openvino.ai/2024/about-openvino/compatibility-and-support/supported-devices.html) the full list of supported devices). It is supplied with a set of tools to optimize your models with compression techniques such as quantization, pruning and knowledge distillation. Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models, convert them to the OpenVINO Intermediate Representation (IR) format and run inference using OpenVINO Runtime. diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx index aaab1b1f83..cb3e9c7581 100644 --- a/docs/source/installation.mdx +++ b/docs/source/installation.mdx @@ -22,6 +22,7 @@ To install the latest release of 🤗 Optimum Intel with the corresponding requi |:-----------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------| | [Intel Neural Compressor (INC)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) | `pip install --upgrade --upgrade-strategy eager "optimum[neural-compressor]"`| | [Intel OpenVINO](https://docs.openvino.ai ) | `pip install --upgrade --upgrade-strategy eager "optimum[openvino]"` | +| [Intel Extension for PyTorch](https://intel.github.io/intel-extension-for-pytorch/#introduction) | `pip install --upgrade --upgrade-strategy eager "optimum[ipex]"` | The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version. @@ -42,4 +43,4 @@ or to install from source including dependencies: python -m pip install "optimum-intel[extras]"@git+https://github.com/huggingface/optimum-intel.git ``` -where `extras` can be one or more of `neural-compressor`, `openvino`, `nncf`. +where `extras` can be one or more of `neural-compressor`, `openvino`, `ipex`. diff --git a/docs/source/ipex/export.mdx b/docs/source/ipex/export.mdx new file mode 100644 index 0000000000..4486bdbd67 --- /dev/null +++ b/docs/source/ipex/export.mdx @@ -0,0 +1,63 @@ + + +# Export your model + +## Using the CLI + +To export your model to the IPEX optimized (patching + weight repack + jit trace) model with the CLI : + +```bash +optimum-cli export ipex -m gpt2 --torch_dtype bfloat16 ipex-gpt2 +``` + +The model argument can either be the model ID of a model hosted on the [Hub](https://huggingface.co/models) or a path to a model hosted locally. For local models, you need to specify the task for which the model should be loaded before export, among the list of the [supported tasks](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/ipex/utils.py). + +```bash +optimum-cli export ipex -m local_gpt2 --task text-generation --torch_dtype bfloat16 ipex-gpt2 +``` + +Check out the help for more options: + +```bash +optimum-cli export ipex --help + +usage: optimum-cli export ipex [-h] -m MODEL [--task TASK] [--trust_remote_code] [--revision REVISION] [--token TOKEN] [--cache_dir CACHE_DIR] [--subfolder SUBFOLDER] + [--local_files_only LOCAL_FILES_ONLY] [--force_download FORCE_DOWNLOAD] [--commit_hash COMMIT_HASH] [--torch_dtype TORCH_DTYPE] + output + +options: + -h, --help show this help message and exit + +Required arguments: + -m MODEL, --model MODEL + Model ID on huggingface.co or path on disk to load model from. + output Path indicating the directory where to store the generated IPEX model. + +Optional arguments: + --task TASK The task to export the model for. If not specified, the task will be auto-inferred based on the model. Available tasks depend on the model. + --trust_remote_code Allows to use custom code for the modeling hosted in the model repository. This option should only be set for repositories you trust and in which you have read the code, as it + will execute on your local machine arbitrary code present in the model repository. + --revision REVISION The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, + so `revision` can be any identifier allowed by git. + --token TOKEN The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated when running `huggingface-cli login` (stored in + `huggingface_hub.constants.HF_TOKEN_PATH`). + --cache_dir CACHE_DIR + Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. + --subfolder SUBFOLDER + In case the relevant files are located inside a subfolder of the model repo either locally or on huggingface.co, you can specify the folder name here. + --local_files_only LOCAL_FILES_ONLY + Whether or not to only look at local files (i.e., do not try to download the model). + --force_download FORCE_DOWNLOAD + Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. + --commit_hash COMMIT_HASH + The commit_hash related to the file. + --torch_dtype TORCH_DTYPE + float16 or bfloat16 or float32: load in a specified dtype, ignoring the model’s config.torch_dtype if one exists. If not specified, the model will get loaded in float32. +``` diff --git a/docs/source/ipex/inference.mdx b/docs/source/ipex/inference.mdx new file mode 100644 index 0000000000..0326754003 --- /dev/null +++ b/docs/source/ipex/inference.mdx @@ -0,0 +1,46 @@ + + +# Inference + +Optimum Intel can be used to load models from the [Hub](https://huggingface.co/models) and create pipelines to run inference with IPEX optimization (including patching, weight prepack and jit trace) on a variety of Intel processors (currently only support for CPU) + + +## Loading + +### Transformers models + +You can load models from HuggingFace Model Hub, it will be optimized by ipex during loading. + +```diff + import torch + from transformers import AutoTokenizer, pipeline +- from transformers import AutoModelForCausalLM ++ from optimum.intel import IPEXModelForCausalLM + + model_id = "gpt2" +- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16) ++ model = IPEXModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, export=True) + tokenizer = AutoTokenizer.from_pretrained(model_id) + pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) + results = pipe("He's a dreadful magician and") +``` + +As shown in the table below, each task is associated with a class enabling to automatically load your model. + +| Auto Class | Task | +|--------------------------------------|--------------------------------------| +| `IPEXModelForSequenceClassification` | `text-classification` | +| `IPEXModelForTokenClassification` | `token-classification` | +| `IPEXModelForQuestionAnswering` | `question-answering` | +| `IPEXModelForImageClassification` | `image-classification` | +| `IPEXModel` | `feature-extraction` | +| `IPEXModelForMaskedLM` | `fill-mask` | +| `IPEXModelForAudioClassification` | `audio-classification` | +| `IPEXModelForCausalLM` | `text-generation` | diff --git a/docs/source/ipex/models.mdx b/docs/source/ipex/models.mdx new file mode 100644 index 0000000000..35c5ba2747 --- /dev/null +++ b/docs/source/ipex/models.mdx @@ -0,0 +1,46 @@ + + +# Supported models + +🤗 Optimum handles the export of models to IPEX in the `exporters.ipex` module. It provides classes, functions, and a command line interface to perform the export easily. +Here is the list of the supported architectures : + +## [Transformers](https://huggingface.co/docs/transformers/index) + +- Albert +- Bart +- Beit +- Bert +- BlenderBot +- BlenderBotSmall +- Bloom +- CodeGen +- DistilBert +- Electra +- Flaubert +- GPT-2 +- GPT-BigCode +- GPT-Neo +- GPT-NeoX +- Llama +- MPT +- Mistral +- MobileNet v1 +- MobileNet v2 +- MobileVit +- OPT +- ResNet +- Roberta +- Roformer +- SqueezeBert +- UniSpeech +- Vit +- Wav2Vec2 +- XLM diff --git a/docs/source/ipex/reference.mdx b/docs/source/ipex/reference.mdx new file mode 100644 index 0000000000..c647b3024c --- /dev/null +++ b/docs/source/ipex/reference.mdx @@ -0,0 +1,71 @@ + + +# Models + +## Generic model classes + +[[autodoc]] ipex.modeling_base.IPEXModel + - _from_pretrained + +## Natural Language Processing + +The following classes are available for the following natural language processing tasks. + +### IPEXModelForCausalLM + +[[autodoc]] ipex.modeling_base.IPEXModelForCausalLM + - forward + - generate + +### IPEXModelForMaskedLM + +[[autodoc]] ipex.modeling_base.IPEXModelForMaskedLM + - forward + +### IPEXModelForQuestionAnswering + +[[autodoc]] ipex.modeling_base.IPEXModelForQuestionAnswering + - forward + +### IPEXModelForSequenceClassification + +[[autodoc]] ipex.modeling_base.IPEXModelForSequenceClassification + - forward + +### IPEXModelForTokenClassification + +[[autodoc]] ipex.modeling_base.IPEXModelForTokenClassification + - forward + + +## Audio + +The following classes are available for the following audio tasks. + +### IPEXModelForAudioClassification + +[[autodoc]] ipex.modeling_base.IPEXModelForAudioClassification + - forward + + +## Computer Vision + +The following classes are available for the following computer vision tasks. + +### IPEXModelForImageClassification + +[[autodoc]] ipex.modeling_base.IPEXModelForImageClassification + - forward + + +### IPEXModelForFeatureExtraction + +[[autodoc]] ipex.modeling_base.IPEXModelForFeatureExtraction + - forward diff --git a/docs/source/ipex/tutorials/notebooks.mdx b/docs/source/ipex/tutorials/notebooks.mdx new file mode 100644 index 0000000000..2093e4fca6 --- /dev/null +++ b/docs/source/ipex/tutorials/notebooks.mdx @@ -0,0 +1,16 @@ + + +# Notebooks + +## Inference + +| Notebook | Description | | | +|:---------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------- |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|------:| +| [How to run inference with the IPEX](https://github.com/huggingface/optimum-intel/tree/main/notebooks/ipex) | Explains how to export your model to IPEX and to run inference with IPEX model on text-generation task | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/optimum-intel/blob/main/notebooks/ipex/text_generation.ipynb) | [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/optimum-intel/blob/main/notebooks/ipex/text_generation.ipynb) | diff --git a/optimum/commands/export/ipex.py b/optimum/commands/export/ipex.py new file mode 100644 index 0000000000..ed26aedb8b --- /dev/null +++ b/optimum/commands/export/ipex.py @@ -0,0 +1,151 @@ +# Copyright 2024 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Defines the command line for the export with IPEX.""" + +import logging +import sys +from pathlib import Path +from typing import TYPE_CHECKING, Optional + +from huggingface_hub.constants import HUGGINGFACE_HUB_CACHE + +from ...exporters import TasksManager +from ..base import BaseOptimumCLICommand, CommandInfo + + +logger = logging.getLogger(__name__) + + +if TYPE_CHECKING: + from argparse import ArgumentParser, Namespace, _SubParsersAction + + +def parse_args_ipex(parser: "ArgumentParser"): + required_group = parser.add_argument_group("Required arguments") + required_group.add_argument( + "-m", "--model", type=str, required=True, help="Model ID on huggingface.co or path on disk to load model from." + ) + required_group.add_argument( + "output", type=Path, help="Path indicating the directory where to store the generated IPEX model." + ) + optional_group = parser.add_argument_group("Optional arguments") + optional_group.add_argument( + "--task", + default="auto", + help=( + "The task to export the model for. If not specified, the task will be auto-inferred based on the model. Available tasks depend on the model." + ), + ) + optional_group.add_argument( + "--trust_remote_code", + action="store_true", + help=( + "Allows to use custom code for the modeling hosted in the model repository. This option should only be set for repositories you trust and in which " + "you have read the code, as it will execute on your local machine arbitrary code present in the model repository." + ), + ) + optional_group.add_argument( + "--revision", + default=None, + help="The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git.", + ) + optional_group.add_argument( + "--token", + default=None, + help="The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated when running `huggingface-cli login` (stored in `huggingface_hub.constants.HF_TOKEN_PATH`).", + ) + optional_group.add_argument( + "--cache_dir", + type=str, + default=HUGGINGFACE_HUB_CACHE, + help="Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.", + ) + optional_group.add_argument( + "--subfolder", + type=str, + default="", + help="In case the relevant files are located inside a subfolder of the model repo either locally or on huggingface.co, you can specify the folder name here.", + ) + optional_group.add_argument( + "--local_files_only", + type=bool, + default=False, + help="Whether or not to only look at local files (i.e., do not try to download the model).", + ) + optional_group.add_argument( + "--force_download", + type=bool, + default=False, + help="Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.", + ) + optional_group.add_argument("--commit_hash", default=None, help="The commit_hash related to the file.") + optional_group.add_argument( + "--torch_dtype", + type=str, + default="float32", + help="float16 or bfloat16 or float32: load in a specified dtype, ignoring the model’s config.torch_dtype if one exists. If not specified, the model will get loaded in float32.", + ) + + +class IPEXExportCommand(BaseOptimumCLICommand): + COMMAND = CommandInfo(name="ipex", help="Export PyTorch models to IPEX IR.") + + def __init__( + self, + subparsers: "_SubParsersAction", + args: Optional["Namespace"] = None, + command: Optional["CommandInfo"] = None, + from_defaults_factory: bool = False, + parser: Optional["ArgumentParser"] = None, + ): + super().__init__( + subparsers, args=args, command=command, from_defaults_factory=from_defaults_factory, parser=parser + ) + self.args_string = " ".join(sys.argv[3:]) + + @staticmethod + def parse_args(parser: "ArgumentParser"): + return parse_args_ipex(parser) + + def run(self): + import torch + + from optimum.intel.ipex.utils import _HEAD_TO_AUTOMODELS + + if self.args.torch_dtype == "bfloat16": + torch_dtype = torch.bfloat16 + elif self.args.torch_dtype == "float16": + torch_dtype = torch.float16 + else: + torch_dtype = torch.float32 + + model_kwargs = { + "revision": self.args.revision, + "token": self.args.token, + "cache_dir": self.args.cache_dir, + "subfolder": self.args.subfolder, + "local_files_only": self.args.local_files_only, + "force_download": self.args.force_download, + "commit_hash": self.args.commit_hash, + "torch_dtype": torch_dtype, + "trust_remote_code": self.args.trust_remote_code, + } + + task = TasksManager.infer_task_from_model(self.args.model) if self.args.task == "auto" else self.args.task + if task not in _HEAD_TO_AUTOMODELS: + raise ValueError(f"{task} is not supported, please choose from {_HEAD_TO_AUTOMODELS}") + + model_class = _HEAD_TO_AUTOMODELS[task] + model = eval(model_class).from_pretrained(self.args.model, **model_kwargs) + model.save_pretrained(self.args.output) diff --git a/optimum/commands/register/register_ipex.py b/optimum/commands/register/register_ipex.py new file mode 100644 index 0000000000..e69c8e0e3b --- /dev/null +++ b/optimum/commands/register/register_ipex.py @@ -0,0 +1,19 @@ +# Copyright 2024 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from ..export import ExportCommand +from ..export.ipex import IPEXExportCommand + + +REGISTER_COMMANDS = [(IPEXExportCommand, ExportCommand)] diff --git a/optimum/intel/ipex/modeling_base.py b/optimum/intel/ipex/modeling_base.py index 18f38cd666..a0a6268ff7 100644 --- a/optimum/intel/ipex/modeling_base.py +++ b/optimum/intel/ipex/modeling_base.py @@ -198,7 +198,7 @@ def _from_pretrained( token: Optional[Union[bool, str]] = None, revision: Optional[str] = None, force_download: bool = False, - cache_dir: str = HUGGINGFACE_HUB_CACHE, + cache_dir: Union[str, Path] = HUGGINGFACE_HUB_CACHE, subfolder: str = "", local_files_only: bool = False, torch_dtype: Optional[Union[str, "torch.dtype"]] = None, @@ -206,6 +206,40 @@ def _from_pretrained( file_name: Optional[str] = WEIGHTS_NAME, **kwargs, ): + """ + Loads a model and its configuration file from a directory or the HF Hub. + + Arguments: + model_id (`str` or `Path`): + The directory from which to load the model. + Can be either: + - The model id of a pretrained model hosted inside a model repo on huggingface.co. + - The path to a directory containing the model weights. + use_auth_token (Optional[Union[bool, str]], defaults to `None`): + Deprecated. Please use `token` instead. + token (Optional[Union[bool, str]], defaults to `None`): + The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated + when running `huggingface-cli login` (stored in `~/.huggingface`). + revision (`str`, *optional*): + The specific model version to use. It can be a branch name, a tag name, or a commit id. + force_download (`bool`, defaults to `False`): + Whether or not to force the (re-)download of the model weights and configuration files, overriding the + cached versions if they exist. + cache_dir (`Union[str, Path]`, *optional*): + The path to a directory in which a downloaded pretrained model configuration should be cached if the + standard cache should not be used. + subfolder (`str`, *optional*) + In case the relevant files are located inside a subfolder of the model repo either locally or on huggingface.co, you can specify the folder name here. + local_files_only (`bool`, *optional*, defaults to `False`): + Whether or not to only look at local files (i.e., do not try to download the model). + torch_dtype (`Optional[Union[str, "torch.dtype"]]`, *optional*) + float16 or bfloat16 or float32: load in a specified dtype, ignoring the model config.torch_dtype if one exists. If not specified, the model will get loaded in float32. + trust_remote_code (`bool`, *optional*) + Allows to use custom code for the modeling hosted in the model repository. This option should only be set for repositories you trust and in which you have read the code, as it will execute on your local machine arbitrary code present in the model repository. + file_name (`str`, *optional*): + The file name of the model to load. Overwrites the default file name and allows one to load the model + with a different name. + """ if use_auth_token is not None: warnings.warn( "The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.", diff --git a/optimum/intel/ipex/utils.py b/optimum/intel/ipex/utils.py index b2644e659e..3d3feb3db2 100644 --- a/optimum/intel/ipex/utils.py +++ b/optimum/intel/ipex/utils.py @@ -14,8 +14,12 @@ _HEAD_TO_AUTOMODELS = { + "feature-extraction": "IPEXModel", "text-generation": "IPEXModelForCausalLM", "text-classification": "IPEXModelForSequenceClassification", "token-classification": "IPEXModelForTokenClassification", "question-answering": "IPEXModelForQuestionAnswering", + "fill-mask": "IPEXModelForMaskedLM", + "image-classification": "IPEXModelForImageClassification", + "audio-classification": "IPEXModelForAudioClassification", }