Skip to content

Commit

Permalink
Set default 4-bit compression ratio to 1.0 (#815)
Browse files Browse the repository at this point in the history
* Set default 4-bit compression ratio to 1.0

* udpate doc

* set default ratio using default config
  • Loading branch information
echarlaix authored and IlyasMoutawwakil committed Aug 6, 2024
1 parent 9b7abae commit 6a78b76
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/openvino/export.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Optional arguments:
--pad-token-id PAD_TOKEN_ID
This is needed by some models, for some tasks. If not provided, will attempt to use the tokenizer to guess it.
--ratio RATIO A parameter used when applying 4-bit quantization to control the ratio between 4-bit and 8-bit quantization. If set to 0.8, 80% of the layers will be quantized to int4 while
20% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 0.8.
20% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 1.0.
--sym Whether to apply symmetric quantization
--group-size GROUP_SIZE
The group size to use for int4 quantization. Recommended value is 128 and -1 will results in per-column quantization.
Expand Down
4 changes: 2 additions & 2 deletions optimum/commands/export/openvino.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def parse_args_openvino(parser: "ArgumentParser"):
default=None,
help=(
"A parameter used when applying 4-bit quantization to control the ratio between 4-bit and 8-bit quantization. If set to 0.8, 80%% of the layers will be quantized to int4 "
"while 20%% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 0.8."
"while 20%% will be quantized to int8. This helps to achieve better accuracy at the sacrifice of the model size and inference latency. Default value is 1.0."
),
)
optional_group.add_argument(
Expand Down Expand Up @@ -277,7 +277,7 @@ def _get_default_int4_config(model_id_or_path, library_name):
else:
quantization_config = {
"bits": 8 if is_int8 else 4,
"ratio": 1 if is_int8 else (self.args.ratio or 0.8),
"ratio": 1 if is_int8 else (self.args.ratio or _DEFAULT_4BIT_CONFIG["ratio"]),
"sym": self.args.sym or False,
"group_size": -1 if is_int8 else self.args.group_size,
"all_layers": None if is_int8 else self.args.all_layers,
Expand Down

0 comments on commit 6a78b76

Please sign in to comment.