add IPEX-XPU support for Llama2 model Inference #703

faaany · 2024-05-08T11:37:52Z

What does this PR do?

This PR enables Intel GPU support for Llama2 model inference in optimum-intel. Below is a code example:

import torch 
from transformers import AutoTokenizer, pipeline
from optimum.intel import IPEXModelForCausalLM

model_id = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = IPEXModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, export=True)
pipe = pipeline("text-generation", model=model, device="xpu", tokenizer=tokenizer, do_sample=False, num_beams=1, use_cache=True)
results = pipe("He's a dreadful magician and")
print(results)
#####[{'generated_text': "He's a dreadful magician and he's always getting things wrong. But he's got a heart of gold and he's always trying his best.\n\nThe other magicians in the circus are not very nice to him. They make fun of him and call him names. But Mr. Higglebottom doesn't let it get him down. He just keeps on trying and practicing his magic tricks.\n\nOne day, the circus is in town and Mr. Higglebottom is given the chance to perform in front of a big audience. He's nervous but he's determined to do his best. And to everyone's surprise, he actually manages to pull off a few good tricks! The audience cheers and claps for him and he feels proud of himself.\n\nFrom that day on, Mr. Higglebottom is no longer the laughing stock of the circus. He's respected and admired by all the other performers and he's finally found his place in the circus. He's learned that it's okay to make mistakes and that with hard work and determination, anything is possible."}]

* add xpu patch to optimum intel * simple path for xpu inference

optimum/exporters/ipex/model_patcher.py

optimum/intel/ipex/modeling_base.py

faaany · 2024-05-09T01:48:03Z

Hi @echarlaix , this PR is a joint effort of @jiqing-feng, @ganyi1996ppo, and me. Could you pls help review this PR? Thanks a lot!

faaany · 2024-05-09T01:48:16Z

@yao-matrix

…e#714) * Fix compatibility for transformers v4.41.0 llama and gemma modeling patching * fix for dev transformers version * update setup

* Fix nncf quantization for decoder models * add test * update op quant op * remove deprecated warning * update expected quantized * enable stateful * style

…to ipex-xpu

HuggingFaceDocBuilderDev · 2024-05-26T05:20:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…to ipex-xpu

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

fix merge conflict

ganyi1996ppo and others added 5 commits May 8, 2024 04:26

add xpu patch to optimum intel (huggingface#7)

5c4d13f

* add xpu patch to optimum intel * simple path for xpu inference

can run but precision error

b1d6989

optimize optimum

f2de914

further optimize

9295457

finalize

c55216a

faaany mentioned this pull request May 8, 2024

add IPEX-XPU support for Llama2 model Inference (greedy search) #701

Closed

jiqing-feng reviewed May 9, 2024

View reviewed changes

optimum/exporters/ipex/model_patcher.py Outdated Show resolved Hide resolved

optimum/intel/ipex/modeling_base.py Outdated Show resolved Hide resolved

fix version

5b3b72d

faaany marked this pull request as ready for review May 9, 2024 01:42

fix ipex version check

4897144

faaany closed this May 15, 2024

jiqing-feng and others added 12 commits May 23, 2024 13:05

ipex 2.3 released

5351f4a

change versions

6289b57

debug beam search

3824300

remove reference elimination

872a3eb

refactor IPEXLlamaAttention

d1d0ca0

Merge branch 'ipex-cpu' into ipex-xpu

3b8900d

Merge branch 'huggingface:main' into ipex-xpu

815d238

add xpu port

89e10d6

Fix llama and gemma modeling patching for openvino export (huggingfac…

9acaba4

…e#714) * Fix compatibility for transformers v4.41.0 llama and gemma modeling patching * fix for dev transformers version * update setup

Fix nncf quantization for decoder models (huggingface#727)

2f4909c

* Fix nncf quantization for decoder models * add test * update op quant op * remove deprecated warning * update expected quantized * enable stateful * style

Merge branch 'ipex-xpu' of https://github.com/faaany/optimum-intel in…

17d02d3

…to ipex-xpu

remove

f186ce7

faaany reopened this May 26, 2024

faaany changed the title ~~add IPEX-XPU support for Llama2 model Inference (greedy search)~~ add IPEX-XPU support for Llama2 model Inference May 26, 2024

fix version

1ff78b2

faaany force-pushed the ipex-xpu branch from 8725f49 to 57cfe11 Compare May 26, 2024 16:06

faaany added 7 commits May 26, 2024 19:17

bug fix

ff7f785

change module

e3dac89

improve device

8725f49

remove

57cfe11

simplfy rmsnorm

ee78f95

Merge branch 'ipex-xpu' of https://github.com/faaany/optimum-intel in…

a930f31

…to ipex-xpu

style

6098943

faaany closed this May 28, 2024

faaany reopened this May 28, 2024

faaany marked this pull request as draft May 28, 2024 08:54

faaany and others added 11 commits June 6, 2024 18:58

fix group attention

e0fb06e

fix weight shape

aa8d395

Merge branch 'main' into ipex-xpu

0a56b19

fix rebase bug

548d83f

revert openvino

68187e5

revert openvino

efedca4

remove duplicates

bd03552

use the correct black

0d3930a

Merge branch 'main' into ipex-xpu

b4ba6d0

fix merge conflict

1fd464b

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Merge pull request #1 from kaixuanliu/ipex-xpu

6a52fdf

fix merge conflict

faaany closed this Dec 12, 2024

faaany deleted the ipex-xpu branch December 12, 2024 03:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add IPEX-XPU support for Llama2 model Inference #703

add IPEX-XPU support for Llama2 model Inference #703

faaany commented May 8, 2024 •

edited

Loading

faaany commented May 9, 2024

faaany commented May 9, 2024

HuggingFaceDocBuilderDev commented May 26, 2024

add IPEX-XPU support for Llama2 model Inference #703

add IPEX-XPU support for Llama2 model Inference #703

Conversation

faaany commented May 8, 2024 • edited Loading

What does this PR do?

faaany commented May 9, 2024

faaany commented May 9, 2024

HuggingFaceDocBuilderDev commented May 26, 2024

faaany commented May 8, 2024 •

edited

Loading