Convert tokenizers with openvino_tokenizers #500

slyalin · 2024-01-05T14:28:46Z

What does this PR do?

Export tokenizer and detokenizer as OpenVINO models using openvino_tokenizers from https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/custom_operations/user_ie_extensions/tokenizer/python. Activated by default as a part of optimum-cli export openvino command line tool. Compatible with https://github.com/openvinotoolkit/openvino.genai/tree/master/text_generation/causal_lm/cpp.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

slyalin · 2024-01-05T14:37:52Z

@apaniukov will cover this functionality. This PR is just to initiate discussion.

optimum/exporters/openvino/__main__.py

HuggingFaceDocBuilderDev · 2024-01-08T09:44:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

eaidova · 2024-01-10T13:42:47Z

optimum/exporters/openvino/__main__.py

+        save_model(ov_tokenizer, tokenizer_path)
+        save_model(ov_detokenizer, detokenizer_path)
+    except Exception as exception:
+        print("[ WARNING ] OpenVINO tokenizer/detokenizer models couldn't be exported because of exception:", exception)


Suggested change

print("[ WARNING ] OpenVINO tokenizer/detokenizer models couldn't be exported because of exception:", exception)

logger.warning(OpenVINO tokenizer/detokenizer models couldn't be exported because of exception:", exception)

eaidova · 2024-01-10T13:45:38Z

optimum/exporters/openvino/__main__.py

+        try:
+            # TODO: Avoid loading the tokenizer again if loaded before
+            tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
+            tokenizer_export(tokenizer, output)
+        except:
+            print("[ WARNING ] Could not load tokenizer using specified model ID or path. OpenVINO tokenizer/detokenizer models won't be generated.")


tokenizer already loaded in maybe_load_preprocessors function, I recommend to check this function result instead. Also possibly you should take into account trust_remote_code parameter if you want load tokenizer explicitly

Reuse tokenizer from maybe_load_preprocessors result. But a new problem is that there are two identical tokenizers - one from AutoTokenizer and one from AutoProcessor. I haven't figured out how to deduplicate them yet.

Here is the PR: slyalin#2

eaidova · 2024-01-10T13:47:49Z

optimum/exporters/openvino/__main__.py

@@ -46,6 +47,24 @@
 logger = logging.getLogger(__name__)


+def tokenizer_export(


I think it should be in convert.py file together with other conversion functions.

slyalin · 2024-01-18T08:50:23Z

Closing in favor of #513

Convert tokenizers with openvino_tokenizers

feba5bf

slyalin commented Jan 5, 2024

View reviewed changes

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved

Update optimum/exporters/openvino/__main__.py

4e7bfa9

eaidova reviewed Jan 10, 2024

View reviewed changes

apaniukov mentioned this pull request Jan 12, 2024

Add OpenVINO Tokenizers #513

Merged

3 tasks

slyalin closed this Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert tokenizers with openvino_tokenizers #500

Convert tokenizers with openvino_tokenizers #500

slyalin commented Jan 5, 2024 •

edited

Loading

slyalin commented Jan 5, 2024

HuggingFaceDocBuilderDev commented Jan 8, 2024

eaidova Jan 10, 2024

apaniukov Jan 10, 2024

eaidova Jan 10, 2024

apaniukov Jan 10, 2024 •

edited

Loading

apaniukov Jan 10, 2024

eaidova Jan 10, 2024

apaniukov Jan 10, 2024

slyalin commented Jan 18, 2024

	print("[ WARNING ] OpenVINO tokenizer/detokenizer models couldn't be exported because of exception:", exception)
	logger.warning(OpenVINO tokenizer/detokenizer models couldn't be exported because of exception:", exception)

		@@ -46,6 +47,24 @@
		logger = logging.getLogger(__name__)


		def tokenizer_export(

Convert tokenizers with openvino_tokenizers #500

Convert tokenizers with openvino_tokenizers #500

Conversation

slyalin commented Jan 5, 2024 • edited Loading

What does this PR do?

Before submitting

slyalin commented Jan 5, 2024

HuggingFaceDocBuilderDev commented Jan 8, 2024

eaidova Jan 10, 2024

Choose a reason for hiding this comment

apaniukov Jan 10, 2024

Choose a reason for hiding this comment

eaidova Jan 10, 2024

Choose a reason for hiding this comment

apaniukov Jan 10, 2024 • edited Loading

Choose a reason for hiding this comment

apaniukov Jan 10, 2024

Choose a reason for hiding this comment

eaidova Jan 10, 2024

Choose a reason for hiding this comment

apaniukov Jan 10, 2024

Choose a reason for hiding this comment

slyalin commented Jan 18, 2024

slyalin commented Jan 5, 2024 •

edited

Loading

apaniukov Jan 10, 2024 •

edited

Loading