INT4 compression support #469

AlexKoff88 · 2023-11-02T15:41:15Z

This is a draft that should be reviewed/revised/merged after the next release of OpenVINO and NNCF.

What is done:

Added int4 weight compression options to the OVQuantizer and optimum-cli
Revised int8 compression accordingly

HuggingFaceDocBuilderDev · 2023-11-02T15:52:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

ngaloppo · 2023-11-02T17:55:17Z

optimum/commands/export/openvino.py

+        "-c",
+        "--compress-weights",
+        type=str,
+        choices=["f16", "i8", "i4_sym_g128", "i4_asym_g128", "i4_sym_g64", "i4_asym_g64"],


Do we need to differentiate between FP32-INT8 and FP16-INT8 compression?

Do you mean keeping FP16 as an independent option?

Right. Is that useful? Are there cases where FP32-INT8 accuracy is significantly better than FP16-INT8 accuracy?

I am not aware of such cases except the ones where FP16 itself is not accurate. Planning to introduce no compression option for this as per comment from Helena.

optimum/exporters/openvino/__main__.py

ngaloppo · 2023-11-02T17:58:41Z

optimum/commands/export/openvino.py

+            "The weight compression option, e.g. f16 stands for float16 weights, i8 - INT8 weights, i4_* - for INT4 compressed weights."
+        ),
+    )
+    optional_group.add_argument("--ratio", type=float, default=0.8, help="Compression ratio between primary and backup precision (only relevant to INT4).")


Might want to issue a warning when the user provides --ratio with int8/fp16.

ngaloppo · 2023-11-02T18:04:37Z

optimum/intel/openvino/quantization.py

@@ -212,6 +232,14 @@ def quantize(
        else:
            raise TypeError(f"Unsupported model type: {type(self.model)}")

+    def _get_compression_options(self, config: OVConfig):


FYI because nncf.CompressWeightsMode is a string Enum, you can do convert the string to the Enum like so:

mode = nncf.CompressWeightsMode["int4_sym"]

So, instead of the table approach you could create the options dictionary by simply parsing the different components of the config.compression string.

the names in the compression_option and in the nncf.CompressWeightsMode are different anyway. Moreover, we introduce some experimental options in the nncf.CompressWeightsMode that we don't want to expose until they are not fully functional with OpenVINO. So, it makes sense to keep the mapping.

Co-authored-by: Nico Galoppo <nico.galoppo@intel.com>

…compression_options

optimum/commands/export/openvino.py

echarlaix

Very nice addition, thanks a lot @AlexKoff88

tests/openvino/utils_tests.py

optimum/commands/export/openvino.py

echarlaix · 2023-12-15T10:51:06Z

optimum/commands/export/openvino.py

+    optional_group.add_argument(
+        "--weight-format",
+        type=str,
+        choices=["f32", "f16", "i8", "i4_sym_g128", "i4_asym_g128", "i4_sym_g64", "i4_asym_g64"],


Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum

also we might want to move the sym/asym from this set of options so that it can also be made available for int8, not sure it's needed though the default asym mode might be enough, let me know what you think

We will have INT8 symmetric in the new version of NNCF. I am also thinking that we need to reduce the number of available options here and keep only symmetrical because they provide a better accuracy-performance trade-off (varying group size and ratio). @ljaljushkin, please provide your opinion as well.

symmetric mode has a better correlation between model size and latency than asymmetric one.
can't say for sure, that varying group size and ratio for symmetric always gives a decent accuracy-performance trade-off.
there are some models when symmetric mode doesn't achieve it.

Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum

I followed the notation of OpenVINO types but I can change.

I am also thinking that we need to reduce the number of available options here

I would much prefer to be able to use all weight compression options available in NNCF in Optimum. In my experience there are always specific cases where they are useful, and it's not good to have to completely switch frameworks/APIs when you want to use them. Also agreed that we should not overwhelm users - but in my opinion we're not there yet - and it also introduces confusion if there are differences between what's available in NNCF and what's available in optimum.

Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum

I followed the notation of OpenVINO types but I can change.

I think that would be easier to keep consistency with other optimum's subpackage.

No strong opinion concerning the symmetric/asymmetric mode, I'm fine with both options

I am also thinking that we need to reduce the number of available options here

I would much prefer to be able to use all weight compression options available in NNCF in Optimum. In my experience there are always specific cases where they are useful, and it's not good to have to completely switch frameworks/APIs when you want to use them. Also agreed that we should not overwhelm users - but in my opinion we're not there yet - and it also introduces confusion if there are differences between what's available in NNCF and what's available in optimum.

Thanks, Helena! Understood your concerns but we have experimental schemes in NNCF that are not yet performant in OpenVINO so I am not going to expose them at this point.

Would be in favor of using fp32, fp16, int8 to keep the same format as for transformers and optimum

I followed the notation of OpenVINO types but I can change.

I think that would be easier to keep consistency with other optimum's subpackage.

No strong opinion concerning the symmetric/asymmetric mode, I'm fine with both options

To strong objections, I can align the names of precisions with other parts of HF ecosystem.

updated names

tests/openvino/utils_tests.py

optimum/commands/export/openvino.py

optimum/exporters/openvino/__main__.py

tests/openvino/utils_tests.py

AlexKoff88 added 3 commits November 2, 2023 17:52

Added compression options to CLI. Revised load_in_8bit

1ef49b1

Added 4 bit compression into quantizer

ceb73e4

Temporary switched to NNCF develop and openvino-nightly

35cef0e

ngaloppo reviewed Nov 2, 2023

View reviewed changes

AlexKoff88 and others added 6 commits November 3, 2023 10:18

Fixed tests

b083150

Style

320e94e

Merge branch 'main' into ak/compression_options

6d22f96

Update optimum/exporters/openvino/__main__.py

e323280

Co-authored-by: Nico Galoppo <nico.galoppo@intel.com>

Merged with master

975b277

Merge remote-tracking branch 'origin/ak/compression_options' into ak/…

4fff849

…compression_options

helena-intel reviewed Dec 12, 2023

View reviewed changes

optimum/commands/export/openvino.py Show resolved Hide resolved

AlexKoff88 added 5 commits December 15, 2023 11:04

Merged with main

f9800b7

Added FP32 option for weights data type

d878453

Style

7f3b7cf

Fixed issue

4c87f03

Fixed setup.py

effb744

echarlaix approved these changes Dec 15, 2023

View reviewed changes

echarlaix reviewed Dec 15, 2023

View reviewed changes

optimum/exporters/openvino/__main__.py Show resolved Hide resolved

AlexKoff88 added 3 commits December 15, 2023 16:42

Applied some comments

96f0af5

Fixed names of precisions

c55610b

Fixed test

d074c19

echarlaix reviewed Dec 18, 2023

View reviewed changes

tests/openvino/utils_tests.py Outdated Show resolved Hide resolved

Update tests/openvino/utils_tests.py

1871329

echarlaix merged commit 173aacd into main Dec 18, 2023
10 of 12 checks passed

echarlaix deleted the ak/compression_options branch December 18, 2023 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INT4 compression support #469

INT4 compression support #469

AlexKoff88 commented Nov 2, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 2, 2023

ngaloppo Nov 2, 2023

AlexKoff88 Nov 9, 2023

ngaloppo Nov 10, 2023

AlexKoff88 Dec 15, 2023

ngaloppo Nov 2, 2023

ngaloppo Nov 2, 2023

AlexKoff88 Dec 15, 2023

echarlaix left a comment

echarlaix Dec 15, 2023

echarlaix Dec 15, 2023

AlexKoff88 Dec 15, 2023

ljaljushkin Dec 15, 2023

AlexKoff88 Dec 15, 2023

helena-intel Dec 15, 2023 •

edited

Loading

echarlaix Dec 15, 2023

AlexKoff88 Dec 18, 2023

AlexKoff88 Dec 18, 2023

AlexKoff88 Dec 18, 2023

INT4 compression support #469

INT4 compression support #469

Conversation

AlexKoff88 commented Nov 2, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Nov 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helena-intel Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexKoff88 commented Nov 2, 2023 •

edited

Loading

helena-intel Dec 15, 2023 •

edited

Loading