support assisted decoding in ipex 2.4 #823

jiqing-feng · 2024-07-15T06:41:43Z

Support assisted decoding if ipex >= 2.5

HuggingFaceDocBuilderDev · 2024-07-15T08:55:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/intel/ipex/modeling_base.py

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

jiqing-feng · 2024-07-16T07:04:21Z

Hi @echarlaix . I have fixed all your comments, please re-run the tests. Thx!

optimum/intel/ipex/modeling_base.py

jiqing-feng · 2024-07-19T02:06:12Z

Hi @IlyasMoutawwakil , could you please review this PR? Thx!

optimum/intel/ipex/modeling_base.py

IlyasMoutawwakil

Can you please add an explanation in the PR of why this patching is necessary ?
Do assistant models interfere with torchscript ?

jiqing-feng · 2024-07-23T06:25:37Z

Can you please add an explanation in the PR of why this patching is necessary ? Do assistant models interfere with torchscript ?

Yes. In assisted decoding, both the target model and the assistant model need to call crop_pask_key_values. The patched IPEXModel has a different format of past_key_values, so we need our own crop functions to apply IPEX KV-cache.

Do you want me to write it as a comment in the codes?

optimum/intel/ipex/modeling_base.py

IlyasMoutawwakil · 2024-07-23T07:13:26Z

Yes. In assisted decoding, both the target model and the assistant model need to call crop_pask_key_values. The patched IPEXModel has a different format of past_key_values, so we need our own crop functions to apply IPEX KV-cache.
Do you want me to write it as a comment in the codes?

Thanks for the explanation ! One last code change and it's good for me 🤗

jiqing-feng · 2024-07-23T07:16:26Z

Yes. In assisted decoding, both the target model and the assistant model need to call crop_pask_key_values. The patched IPEXModel has a different format of past_key_values, so we need our own crop functions to apply IPEX KV-cache.
Do you want me to write it as a comment in the codes?

Thanks for the explanation ! One last code change and it's good for me 🤗

I have fixed your required changes. The _unpatch_crop_past_key_values() can be safely added at the end of the generation because it will revert the function crop_pask_key_values if we patch and keep the same if we didn't patch.

BTW, please help to trigger the CI if it's okay for you, thx!

jiqing-feng · 2024-09-04T09:58:31Z

Hi @echarlaix . I have enabled assisted decoding tests in ipex 2.4. Please take a review, thx~

optimum/exporters/ipex/model_config.py

optimum/intel/ipex/modeling_base.py

echarlaix · 2024-09-04T10:02:44Z

optimum/intel/ipex/modeling_base.py

+    transformers.generation.candidate_generator._crop_past_key_values = _ipex_crop_past_key_values
+    transformers.generation.utils._crop_past_key_values = _ipex_crop_past_key_values


could this be unpatch after export ?

As we discussed in Teams, this is the only way to enable all assisted decoding cases:

transformers target model + ipex draft model

ipex target model + transformers draft model

ipex target model + ipex draft model

The _crop_past_key_values function is the same level as the model, we cannot do un-patch inside the generate function because it will run after generate, see here.

I have checked the model type inside the _ipex_crop_past_key_values. It only has impact on IPEX model, transformers model will go into the original function, so there is no risk even we don't unpatch.

jiqing-feng · 2024-09-09T06:55:40Z

Hi @echarlaix . I can do the unpatch for target model only, please look the new changes

support assisted decoding in ipex 2.5

d318fcc

echarlaix reviewed Jul 15, 2024

View reviewed changes

optimum/intel/ipex/modeling_base.py Outdated Show resolved Hide resolved

optimum/intel/ipex/modeling_base.py Outdated Show resolved Hide resolved

optimum/intel/ipex/modeling_base.py Outdated Show resolved Hide resolved

jiqing-feng and others added 4 commits July 15, 2024 17:45

Update optimum/intel/ipex/modeling_base.py

d43f7ec

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

fix tests fail

f2e2237

fix style

f77ee24

ipex onnx config

2f58aec

echarlaix requested a review from IlyasMoutawwakil July 17, 2024 08:32

echarlaix reviewed Jul 17, 2024

View reviewed changes

optimum/intel/ipex/modeling_base.py Outdated Show resolved Hide resolved

jiqing-feng added 2 commits July 17, 2024 04:48

patch before generate and un-patch after generate

ef009d0

only patch functions in assisted decoding

b0df211

IlyasMoutawwakil reviewed Jul 19, 2024

View reviewed changes

optimum/intel/ipex/modeling_base.py Outdated Show resolved Hide resolved

IlyasMoutawwakil reviewed Jul 19, 2024

View reviewed changes

optimum/intel/ipex/modeling_base.py Show resolved Hide resolved

IlyasMoutawwakil requested changes Jul 19, 2024

View reviewed changes

IlyasMoutawwakil reviewed Jul 23, 2024

View reviewed changes

optimum/intel/ipex/modeling_base.py Outdated Show resolved Hide resolved

jiqing-feng requested a review from IlyasMoutawwakil July 23, 2024 07:17

jiqing-feng force-pushed the assist branch from 8a65794 to 6d12238 Compare July 23, 2024 07:35

jiqing-feng mentioned this pull request Jul 23, 2024

rm ipex inference #837

Merged

jiqing-feng and others added 4 commits July 23, 2024 10:16

try and cache the genration result and do un-patch

5c27181

raise error

6d12238

Merge branch 'main' into assist

47ff118

fix style

19dbb08

IlyasMoutawwakil approved these changes Aug 27, 2024

View reviewed changes

Merge branch 'main' into ipex-assist-decoding

0d37ae9

jiqing-feng changed the title ~~support assisted decoding in ipex 2.5~~ support assisted decoding in ipex 2.4 Sep 4, 2024

echarlaix approved these changes Sep 4, 2024

View reviewed changes

jiqing-feng added 6 commits September 4, 2024 11:25

ipex 2.4 supports assisted decoding

e85240c

fix inputs

d5c491c

fix generate

e362b28

enable assisted decoding tests

2ab14a1

more tests on assisted decoding

bbb6a21

fix config name

0748782

echarlaix merged commit 5db1ac7 into huggingface:main Sep 9, 2024
17 checks passed

unpatch target model's generation

9837e46

jiqing-feng deleted the assist branch September 10, 2024 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support assisted decoding in ipex 2.4 #823

support assisted decoding in ipex 2.4 #823

jiqing-feng commented Jul 15, 2024

HuggingFaceDocBuilderDev commented Jul 15, 2024

jiqing-feng commented Jul 16, 2024

jiqing-feng commented Jul 19, 2024

IlyasMoutawwakil left a comment

jiqing-feng commented Jul 23, 2024 •

edited

Loading

IlyasMoutawwakil commented Jul 23, 2024

jiqing-feng commented Jul 23, 2024 •

edited

Loading

jiqing-feng commented Sep 4, 2024

echarlaix Sep 4, 2024

jiqing-feng Sep 9, 2024 •

edited

Loading

jiqing-feng commented Sep 9, 2024

		transformers.generation.candidate_generator._crop_past_key_values = _ipex_crop_past_key_values
		transformers.generation.utils._crop_past_key_values = _ipex_crop_past_key_values

support assisted decoding in ipex 2.4 #823

support assisted decoding in ipex 2.4 #823

Conversation

jiqing-feng commented Jul 15, 2024

HuggingFaceDocBuilderDev commented Jul 15, 2024

jiqing-feng commented Jul 16, 2024

jiqing-feng commented Jul 19, 2024

IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

jiqing-feng commented Jul 23, 2024 • edited Loading

IlyasMoutawwakil commented Jul 23, 2024

jiqing-feng commented Jul 23, 2024 • edited Loading

jiqing-feng commented Sep 4, 2024

echarlaix Sep 4, 2024

Choose a reason for hiding this comment

jiqing-feng Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

jiqing-feng commented Sep 9, 2024

jiqing-feng commented Jul 23, 2024 •

edited

Loading

jiqing-feng commented Jul 23, 2024 •

edited

Loading

jiqing-feng Sep 9, 2024 •

edited

Loading