Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INT4 compression support #469

Merged
merged 18 commits into from
Dec 18, 2023
Merged

INT4 compression support #469

merged 18 commits into from
Dec 18, 2023

Conversation

AlexKoff88
Copy link
Collaborator

@AlexKoff88 AlexKoff88 commented Nov 2, 2023

This is a draft that should be reviewed/revised/merged after the next release of OpenVINO and NNCF.

What is done:

  • Added int4 weight compression options to the OVQuantizer and optimum-cli
  • Revised int8 compression accordingly

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

"-c",
"--compress-weights",
type=str,
choices=["f16", "i8", "i4_sym_g128", "i4_asym_g128", "i4_sym_g64", "i4_asym_g64"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to differentiate between FP32-INT8 and FP16-INT8 compression?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean keeping FP16 as an independent option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Is that useful? Are there cases where FP32-INT8 accuracy is significantly better than FP16-INT8 accuracy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of such cases except the ones where FP16 itself is not accurate. Planning to introduce no compression option for this as per comment from Helena.

optimum/exporters/openvino/__main__.py Outdated Show resolved Hide resolved
"The weight compression option, e.g. f16 stands for float16 weights, i8 - INT8 weights, i4_* - for INT4 compressed weights."
),
)
optional_group.add_argument("--ratio", type=float, default=0.8, help="Compression ratio between primary and backup precision (only relevant to INT4).")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to issue a warning when the user provides --ratio with int8/fp16.

@@ -212,6 +232,14 @@ def quantize(
else:
raise TypeError(f"Unsupported model type: {type(self.model)}")

def _get_compression_options(self, config: OVConfig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI because nncf.CompressWeightsMode is a string Enum, you can do convert the string to the Enum like so:

mode = nncf.CompressWeightsMode["int4_sym"]

So, instead of the table approach you could create the options dictionary by simply parsing the different components of the config.compression string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the names in the compression_option and in the nncf.CompressWeightsMode are different anyway. Moreover, we introduce some experimental options in the nncf.CompressWeightsMode that we don't want to expose until they are not fully functional with OpenVINO. So, it makes sense to keep the mapping.

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice addition, thanks a lot @AlexKoff88

tests/openvino/utils_tests.py Show resolved Hide resolved
optimum/commands/export/openvino.py Show resolved Hide resolved
optional_group.add_argument(
"--weight-format",
type=str,
choices=["f32", "f16", "i8", "i4_sym_g128", "i4_asym_g128", "i4_sym_g64", "i4_asym_g64"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we might want to move the sym/asym from this set of options so that it can also be made available for int8, not sure it's needed though the default asym mode might be enough, let me know what you think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have INT8 symmetric in the new version of NNCF. I am also thinking that we need to reduce the number of available options here and keep only symmetrical because they provide a better accuracy-performance trade-off (varying group size and ratio). @ljaljushkin, please provide your opinion as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symmetric mode has a better correlation between model size and latency than asymmetric one.
can't say for sure, that varying group size and ratio for symmetric always gives a decent accuracy-performance trade-off.
there are some models when symmetric mode doesn't achieve it.
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum

I followed the notation of OpenVINO types but I can change.

Copy link
Collaborator

@helena-intel helena-intel Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also thinking that we need to reduce the number of available options here

I would much prefer to be able to use all weight compression options available in NNCF in Optimum. In my experience there are always specific cases where they are useful, and it's not good to have to completely switch frameworks/APIs when you want to use them. Also agreed that we should not overwhelm users - but in my opinion we're not there yet - and it also introduces confusion if there are differences between what's available in NNCF and what's available in optimum.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum

I followed the notation of OpenVINO types but I can change.

I think that would be easier to keep consistency with other optimum's subpackage.

No strong opinion concerning the symmetric/asymmetric mode, I'm fine with both options

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also thinking that we need to reduce the number of available options here

I would much prefer to be able to use all weight compression options available in NNCF in Optimum. In my experience there are always specific cases where they are useful, and it's not good to have to completely switch frameworks/APIs when you want to use them. Also agreed that we should not overwhelm users - but in my opinion we're not there yet - and it also introduces confusion if there are differences between what's available in NNCF and what's available in optimum.

Thanks, Helena! Understood your concerns but we have experimental schemes in NNCF that are not yet performant in OpenVINO so I am not going to expose them at this point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum

I followed the notation of OpenVINO types but I can change.

I think that would be easier to keep consistency with other optimum's subpackage.

No strong opinion concerning the symmetric/asymmetric mode, I'm fine with both options

To strong objections, I can align the names of precisions with other parts of HF ecosystem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated names

tests/openvino/utils_tests.py Outdated Show resolved Hide resolved
optimum/commands/export/openvino.py Outdated Show resolved Hide resolved
@echarlaix echarlaix merged commit 173aacd into main Dec 18, 2023
10 of 12 checks passed
@echarlaix echarlaix deleted the ak/compression_options branch December 18, 2023 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants