refactor CPU llama inference code #728

faaany · 2024-05-26T05:22:30Z

What does this PR do?

This PR refactors the current CPU llama inference code to make code clean. The major changes are as follows:

introduce a new class _IPEXLlamaAttention and move the attention-related OPs and attention forward code to _IPEXLlamaAttention
introduce a new class _IPEXLlamaMLP and move the MLP-related OPs and forward code to _IPEXLlamaMLP
simplify _patch_llama_model
rename _IPEXLlamaDecoderLayerRef to _IPEXLlamaDecoderLayer
refactor the forward mtehod of _IPEXLlamaAttention into gemm, rope and sdpa

Please note that this PR is based on the unmerged PR #725 by Jiqing as can be seen in the commit history.

HuggingFaceDocBuilderDev · 2024-05-26T05:27:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…pex-cpu

…to ipex-cpu

optimum/exporters/ipex/modeling_utils.py

tests/ipex/test_modeling.py

optimum/exporters/ipex/model_patcher.py

…into ipex-cpu

optimum/exporters/ipex/modeling_utils.py

echarlaix

Looks good thanks @faaany, let's wait for #725 to be merged before merging

echarlaix · 2024-06-06T13:59:21Z

#725 is now merged, you mind rebasing @faaany ?

faaany · 2024-06-06T15:00:10Z

#725 is now merged, you mind rebasing @faaany ?

Cool! Rebase done, pls have a review. Thx!

echarlaix

LGTM thanks @faaany

tests/ipex/test_modeling.py

echarlaix · 2024-06-06T15:56:57Z

optimum/exporters/ipex/modeling_utils.py

-        if is_transformers_version("<", _TRANSFORMERS_MIN_VERSION) or is_transformers_version(
-            ">", _TRANSFORMERS_MAX_VERSION
-        ):
-            raise ImportError(
-                f"Only transformers versions {_TRANSFORMERS_MIN_VERSION} ~ {_TRANSFORMERS_MAX_VERSION} are verified."


We should keep this as only the transformers versions in between are supported, but this should be moved what do you think about :

optimum-intel/optimum/exporters/ipex/model_patcher.py

Line 91 in 6888c0a

def _patch_model(model):

yes, totally agree! code updated.

thanks for the detailed review! the rebase and merge conflict did bring some headache

optimum/intel/utils/modeling_utils.py

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

faaany · 2024-06-07T02:05:22Z

Hi @echarlaix , I manually-checked the changes in #725 and fixed the bugs introduced through rebasing. Now all tests passed, I think we are good to go.

echarlaix

Great work, thanks a lot @faaany !

jiqing-feng and others added 3 commits May 23, 2024 13:05

ipex 2.3 released

5351f4a

refactor IPEXLlamaAttention

d1d0ca0

Merge branch 'huggingface:main' into ipex-cpu

bd5706c

faaany mentioned this pull request May 26, 2024

add IPEX-XPU support for Llama2 model Inference #703

Closed

faaany added 3 commits May 26, 2024 05:36

Merge branch 'main' of https://github.com/faaany/optimum-intel into i…

e61382b

…pex-cpu

change to Ref

48b205e

Merge branch 'ipex-cpu' of https://github.com/faaany/optimum-intel in…

404486a

…to ipex-cpu

yao-matrix reviewed May 27, 2024

View reviewed changes

faaany and others added 14 commits May 27, 2024 09:32

remove Ref

4ea8a47

skip tests

1f98d6d

skip tests

d3ce377

skip testing without pkv

b2b93bb

Merge branch 'rename' of https://github.com/jiqing-feng/optimum-intel …

ec0f641

…into ipex-cpu

add tests skip

64dcde4

only llama2 with at least 64 head size support IAKV

945f6b6

Merge branch 'rename' of https://github.com/jiqing-feng/optimum-intel …

0733625

…into ipex-cpu

cannot assert same outputs cause do_sample=True

c8922f3

Merge branch 'rename' of https://github.com/jiqing-feng/optimum-intel …

0ff1d7b

…into ipex-cpu

rm tiny-llama model testing cause it not work for IAKV

2ddfa7a

fix code style

f4e887d

Merge branch 'rename' of https://github.com/jiqing-feng/optimum-intel …

923e233

…into ipex-cpu

refine docstring

74f132e

jiqing-feng reviewed May 30, 2024

View reviewed changes

optimum/exporters/ipex/modeling_utils.py Outdated Show resolved Hide resolved

faaany added 6 commits May 29, 2024 23:15

fix duplicted code

e130345

refactor attention forward

14673da

add use_cache for rope

a2a969e

use with and without cache

3abd790

refine code

82bd0c7

add reference link

de2cc43

echarlaix reviewed Jun 5, 2024

View reviewed changes

faaany added 2 commits June 6, 2024 22:50

Merge branch 'main' into ipex-cpu

1385f97

bug fix

752aba6

use reshape

1ef8d56

echarlaix reviewed Jun 6, 2024

View reviewed changes

faaany and others added 2 commits June 7, 2024 00:48

Apply suggestions from code review

5f5d205

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

fix

22860f2

faaany force-pushed the ipex-cpu branch from 97c49e0 to 22860f2 Compare June 7, 2024 01:09

echarlaix approved these changes Jun 7, 2024

View reviewed changes

echarlaix merged commit 36e5b23 into huggingface:main Jun 7, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor CPU llama inference code #728

refactor CPU llama inference code #728

faaany commented May 26, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented May 26, 2024

echarlaix left a comment

echarlaix commented Jun 6, 2024

faaany commented Jun 6, 2024

echarlaix left a comment

echarlaix Jun 6, 2024

faaany Jun 6, 2024

faaany Jun 6, 2024

faaany commented Jun 7, 2024 •

edited

Loading

echarlaix left a comment

refactor CPU llama inference code #728

refactor CPU llama inference code #728

Conversation

faaany commented May 26, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented May 26, 2024

echarlaix left a comment

Choose a reason for hiding this comment

echarlaix commented Jun 6, 2024

faaany commented Jun 6, 2024

echarlaix left a comment

Choose a reason for hiding this comment

echarlaix Jun 6, 2024

Choose a reason for hiding this comment

faaany Jun 6, 2024

Choose a reason for hiding this comment

faaany Jun 6, 2024

Choose a reason for hiding this comment

faaany commented Jun 7, 2024 • edited Loading

echarlaix left a comment

Choose a reason for hiding this comment

faaany commented May 26, 2024 •

edited

Loading

faaany commented Jun 7, 2024 •

edited

Loading