-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenVINO Tokenizers #513
Conversation
@echarlaix, @AlexKoff88, please approve workflows. PR is ready to merge, please review (cannot add reviewers). |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @apaniukov @slyalin
f2a2877
to
57782d1
Compare
Reuse existing preprocessors
Check logs from tokneizers test
@echarlaix could you merge the PR, failed tests are not related to openvino-tokenizers. |
@echarlaix, please consider for merge. |
if tokenizer is not None and is_openvino_tokenizers_available(): | ||
try: | ||
export_tokenizer(tokenizer, output) | ||
except Exception as exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer to add an argument to trigger this export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add --convert-tokenizer
Option to CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot @apaniukov, after our internal discussion I'm OK with both solutions so we can either revert 09b067f or keep convert-tokenizer
(+ some tests needs to be fixed), let me know what work best for you and will merge afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, resolved merge conflicts and pushed a fix for the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apaniukov, consider creating a PR with removing the mandatory --convert-tokenizer
. Instead of enabling conversion we probably need a key to disable conversion, something like --disable-convert-tokenizer
. There are options for how to deprecate the old option:
- Remove the old key without any trace. Old users who hard-code somewhere this key will see the error about an unknown option.
- Keep the old key, print a warning that explains that this key is not needed anymore because tokenizers are converted by default, and instead of enabling conversion now you can disable it if something really bad happens in that part. So explain the change with this key in all details. But conversion goes as it should go, tokenizers are converted, the model is converted, and that message is just a warning.
- The same as [2] but the message is an error instead of a warning, the export fails, and nothing is produced as a result.
I like option [2] considering I don't know the level of adoption of the existing key. But if there is evidence that nobody has started using --convert-tokenizer
, option 3 is better, and if we are bold enough, go with option [1].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this PR do?
Add OpenVINO Tokenizer conversion to CLI conversion pipeline.
This PR is based on another PR: #500
Before submitting