-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fp8 implementation #1100
Merged
Merged
Fp8 implementation #1100
Changes from 10 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
c93c2e7
Fp8 implementation
KodiaqQ 44f11a7
All datasets support
KodiaqQ b54abf1
Added test
KodiaqQ 6f5cd5b
Update test
KodiaqQ ac7b57a
Correctness
KodiaqQ 2df7fc4
Correctness
KodiaqQ 710f50a
Update docs/source/openvino/export.mdx
KodiaqQ 3174ef0
Change test model
KodiaqQ 022908a
Merge branch 'nm/fp8_impl' of https://github.com/KodiaqQ/optimum-inte…
KodiaqQ 0a8e3e7
Apply comments
KodiaqQ 83dbea7
Merge remote-tracking branch 'huggingface/main' into nm/fp8_impl
KodiaqQ File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -202,31 +202,31 @@ | |
|
||
|
||
def get_num_quantized_nodes(model): | ||
num_fake_quantize = 0 | ||
num_weight_nodes = { | ||
"int8": 0, | ||
"int4": 0, | ||
"f4e2m1": 0, | ||
"f8e8m0": 0, | ||
"nf4": 0, | ||
num_fake_nodes = 0 | ||
types_map = { | ||
"i8": "int8", | ||
Comment on lines
+210
to
+211
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
"u8": "int8", | ||
"i4": "int4", | ||
"u4": "int4", | ||
"f4e2m1": "f4e2m1", | ||
"f8e8m0": "f8e8m0", | ||
"nf4": "nf4", | ||
"f8e4m3": "f8e4m3", | ||
"f8e5m2": "f8e5m2", | ||
} | ||
num_weight_nodes = {n: 0 for n in types_map.values()} | ||
ov_model = model if isinstance(model, ov.Model) else model.model | ||
for elem in ov_model.get_ops(): | ||
if "FakeQuantize" in elem.name: | ||
num_fake_quantize += 1 | ||
num_fake_nodes += 1 | ||
if "FakeConvert" in elem.name: | ||
num_fake_nodes += 1 | ||
for i in range(elem.get_output_size()): | ||
type_name = elem.get_output_element_type(i).get_type_name() | ||
if type_name in ["i8", "u8"]: | ||
num_weight_nodes["int8"] += 1 | ||
if type_name in ["i4", "u4"]: | ||
num_weight_nodes["int4"] += 1 | ||
if type_name == "f4e2m1": | ||
num_weight_nodes["f4e2m1"] += 1 | ||
if type_name == "f8e8m0": | ||
num_weight_nodes["f8e8m0"] += 1 | ||
if type_name == "nf4": | ||
num_weight_nodes["nf4"] += 1 | ||
return num_fake_quantize, num_weight_nodes | ||
if type_name in types_map: | ||
name = types_map[type_name] | ||
num_weight_nodes[name] += 1 | ||
return num_fake_nodes, num_weight_nodes | ||
|
||
|
||
@contextmanager | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that applying quantization to language models is the intended use case for fp8 quantization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the purpose of the fp8 usage is. The ticket says about LLM & diffusers at least.