-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support assisted decoding in ipex 2.4 #823
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
Hi @echarlaix . I have fixed all your comments, please re-run the tests. Thx! |
Hi @IlyasMoutawwakil , could you please review this PR? Thx! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add an explanation in the PR of why this patching is necessary ?
Do assistant models interfere with torchscript ?
Yes. In assisted decoding, both the target model and the assistant model need to call Do you want me to write it as a comment in the codes? |
Thanks for the explanation ! One last code change and it's good for me 🤗 |
I have fixed your required changes. The BTW, please help to trigger the CI if it's okay for you, thx! |
Hi @echarlaix . I have enabled assisted decoding tests in ipex 2.4. Please take a review, thx~ |
optimum/intel/ipex/modeling_base.py
Outdated
transformers.generation.candidate_generator._crop_past_key_values = _ipex_crop_past_key_values | ||
transformers.generation.utils._crop_past_key_values = _ipex_crop_past_key_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be unpatch after export ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed in Teams, this is the only way to enable all assisted decoding cases:
- transformers target model + ipex draft model
- ipex target model + transformers draft model
- ipex target model + ipex draft model
The _crop_past_key_values
function is the same level as the model, we cannot do un-patch inside the generate function because it will run after generate, see here.
I have checked the model type inside the _ipex_crop_past_key_values
. It only has impact on IPEX model, transformers model will go into the original function, so there is no risk even we don't unpatch.
Hi @echarlaix . I can do the unpatch for target model only, please look the new changes |
Support assisted decoding if ipex >= 2.5