-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fp8 implementation #1100
Fp8 implementation #1100
Conversation
@nikita-savelyevv, @AlexKoff88, could you please review it? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
…into nm/fp8_impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
if self.quantization_config is not None: | ||
if isinstance(self.quantization_config, OVWeightQuantizationConfig): | ||
self.dtype = self.quantization_config.weight_format | ||
else: | ||
self.dtype = "int8" | ||
else: | ||
self.dtype = dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be changed to:
if self.quantization_config is not None:
self.dtype = self.quantization_config.weight_format
else:
self.dtype = dtype
tests/openvino/test_exporters_cli.py
Outdated
self.assertEqual(len(expected_num_fq_nodes_per_model), len(models)) | ||
for i, model in enumerate(models): | ||
actual_num_f_nodes, actual_num_weight_nodes = get_num_quantized_nodes(model) | ||
self.assertEqual(expected_num_fq_nodes_per_model[i], actual_num_f_nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertEqual(expected_num_fq_nodes_per_model[i], actual_num_f_nodes) | |
self.assertEqual(expected_num_f_nodes_per_model[i], actual_num_f_nodes) |
if not self.sym: | ||
if self.activation_format != "int8": | ||
raise ValueError( | ||
f"Asymmetric quantization can not be performed in {self.activation_format} activation format." | ||
) | ||
if self.weight_format != "int8": | ||
raise ValueError( | ||
f"Asymmetric quantization can not be performed in {self.weight_format} weight format." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to initialize sym
as True
inside OVQuantizatioConfig
constructor if fp8
mode is selected. This option is intended to be used with int
data types and does not quite make sense with fp8
data types. Also, this way --sym
won't be needed to be specified every time fp8
modes are used.
cc @AlexKoff88
"llama", | ||
"f8e4m3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that applying quantization to language models is the intended use case for fp8 quantization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the purpose of the fp8 usage is. The ticket says about LLM & diffusers at least.
types_map = { | ||
"i8": "int8", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@IlyasMoutawwakil, @echarlaix, this PR is ready for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@KodiaqQ, can you please resolve conflicts to merge this PR? |
Done. |
What does this PR do?
--quant-mode
parameter:f8e4m3
,f8e5m2
.Before submitting