Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenVINO Tokenizers #513

Merged
merged 26 commits into from
Feb 8, 2024
Merged

Conversation

apaniukov
Copy link
Contributor

@apaniukov apaniukov commented Jan 12, 2024

What does this PR do?

Add OpenVINO Tokenizer conversion to CLI conversion pipeline.

This PR is based on another PR: #500

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@slyalin
Copy link
Contributor

slyalin commented Jan 12, 2024

@echarlaix, @AlexKoff88, please approve workflows. PR is ready to merge, please review (cannot add reviewers).

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @apaniukov @slyalin

tests/openvino/test_exporters_cli.py Show resolved Hide resolved
tests/openvino/test_exporters_cli.py Show resolved Hide resolved
optimum/exporters/openvino/convert.py Outdated Show resolved Hide resolved
optimum/exporters/openvino/convert.py Outdated Show resolved Hide resolved
optimum/exporters/openvino/convert.py Outdated Show resolved Hide resolved
optimum/exporters/openvino/convert.py Outdated Show resolved Hide resolved
optimum/exporters/openvino/convert.py Outdated Show resolved Hide resolved
@apaniukov apaniukov requested a review from AlexKoff88 January 19, 2024 14:49
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@apaniukov
Copy link
Contributor Author

@echarlaix could you merge the PR, failed tests are not related to openvino-tokenizers.

@AlexKoff88
Copy link
Collaborator

@echarlaix, please consider for merge.

Comment on lines 347 to 350
if tokenizer is not None and is_openvino_tokenizers_available():
try:
export_tokenizer(tokenizer, output)
except Exception as exception:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer to add an argument to trigger this export

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add --convert-tokenizer Option to CLI

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @apaniukov, after our internal discussion I'm OK with both solutions so we can either revert 09b067f or keep convert-tokenizer (+ some tests needs to be fixed), let me know what work best for you and will merge afterwards

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, resolved merge conflicts and pushed a fix for the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apaniukov, consider creating a PR with removing the mandatory --convert-tokenizer. Instead of enabling conversion we probably need a key to disable conversion, something like --disable-convert-tokenizer. There are options for how to deprecate the old option:

  1. Remove the old key without any trace. Old users who hard-code somewhere this key will see the error about an unknown option.
  2. Keep the old key, print a warning that explains that this key is not needed anymore because tokenizers are converted by default, and instead of enabling conversion now you can disable it if something really bad happens in that part. So explain the change with this key in all details. But conversion goes as it should go, tokenizers are converted, the model is converted, and that message is just a warning.
  3. The same as [2] but the message is an error instead of a warning, the export fails, and nothing is produced as a result.

I like option [2] considering I don't know the level of adoption of the existing key. But if there is evidence that nobody has started using --convert-tokenizer, option 3 is better, and if we are bold enough, go with option [1].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echarlaix echarlaix merged commit 2be2e75 into huggingface:main Feb 8, 2024
8 of 10 checks passed
@apaniukov apaniukov mentioned this pull request Feb 29, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants