-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INT4 compression support #469
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
optimum/commands/export/openvino.py
Outdated
"-c", | ||
"--compress-weights", | ||
type=str, | ||
choices=["f16", "i8", "i4_sym_g128", "i4_asym_g128", "i4_sym_g64", "i4_asym_g64"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to differentiate between FP32-INT8 and FP16-INT8 compression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean keeping FP16 as an independent option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Is that useful? Are there cases where FP32-INT8 accuracy is significantly better than FP16-INT8 accuracy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of such cases except the ones where FP16 itself is not accurate. Planning to introduce no compression option for this as per comment from Helena.
optimum/commands/export/openvino.py
Outdated
"The weight compression option, e.g. f16 stands for float16 weights, i8 - INT8 weights, i4_* - for INT4 compressed weights." | ||
), | ||
) | ||
optional_group.add_argument("--ratio", type=float, default=0.8, help="Compression ratio between primary and backup precision (only relevant to INT4).") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to issue a warning when the user provides --ratio
with int8/fp16.
@@ -212,6 +232,14 @@ def quantize( | |||
else: | |||
raise TypeError(f"Unsupported model type: {type(self.model)}") | |||
|
|||
def _get_compression_options(self, config: OVConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI because nncf.CompressWeightsMode
is a string Enum
, you can do convert the string to the Enum like so:
mode = nncf.CompressWeightsMode["int4_sym"]
So, instead of the table approach you could create the options dictionary by simply parsing the different components of the config.compression
string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the names in the compression_option and in the nncf.CompressWeightsMode are different anyway. Moreover, we introduce some experimental options in the nncf.CompressWeightsMode that we don't want to expose until they are not fully functional with OpenVINO. So, it makes sense to keep the mapping.
Co-authored-by: Nico Galoppo <nico.galoppo@intel.com>
…compression_options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice addition, thanks a lot @AlexKoff88
optimum/commands/export/openvino.py
Outdated
optional_group.add_argument( | ||
"--weight-format", | ||
type=str, | ||
choices=["f32", "f16", "i8", "i4_sym_g128", "i4_asym_g128", "i4_sym_g64", "i4_asym_g64"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we might want to move the sym/asym from this set of options so that it can also be made available for int8, not sure it's needed though the default asym mode might be enough, let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will have INT8 symmetric in the new version of NNCF. I am also thinking that we need to reduce the number of available options here and keep only symmetrical because they provide a better accuracy-performance trade-off (varying group size and ratio). @ljaljushkin, please provide your opinion as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum
I followed the notation of OpenVINO types but I can change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also thinking that we need to reduce the number of available options here
I would much prefer to be able to use all weight compression options available in NNCF in Optimum. In my experience there are always specific cases where they are useful, and it's not good to have to completely switch frameworks/APIs when you want to use them. Also agreed that we should not overwhelm users - but in my opinion we're not there yet - and it also introduces confusion if there are differences between what's available in NNCF and what's available in optimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum
I followed the notation of OpenVINO types but I can change.
I think that would be easier to keep consistency with other optimum's subpackage.
No strong opinion concerning the symmetric/asymmetric mode, I'm fine with both options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also thinking that we need to reduce the number of available options here
I would much prefer to be able to use all weight compression options available in NNCF in Optimum. In my experience there are always specific cases where they are useful, and it's not good to have to completely switch frameworks/APIs when you want to use them. Also agreed that we should not overwhelm users - but in my opinion we're not there yet - and it also introduces confusion if there are differences between what's available in NNCF and what's available in optimum.
Thanks, Helena! Understood your concerns but we have experimental schemes in NNCF that are not yet performant in OpenVINO so I am not going to expose them at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum
I followed the notation of OpenVINO types but I can change.
I think that would be easier to keep consistency with other optimum's subpackage.
No strong opinion concerning the symmetric/asymmetric mode, I'm fine with both options
To strong objections, I can align the names of precisions with other parts of HF ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated names
This is a draft that should be reviewed/revised/merged after the next release of OpenVINO and NNCF.
What is done:
OVQuantizer
and optimum-cli