How to disable deepspeed quantization during inference? #3567
Unanswered
yuchen2580
asked this question in
Q&A
Replies: 1 comment
-
@yuchen2580 Have you figured out this problem? I also have the same confusion |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I follow the tutorial and use the following code for inference:
However, when i exam the log output, i found:
[2023-05-18 11:50:58,071] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.1, git-hash=unknown, git-branch=unknown
[2023-05-18 11:50:58,073] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
05/18/2023 11:50:58 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ethany/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/bcb9b8b48fdeae767d48b3ce9341d5b691048450328db6e6f1a9583eb759599a/cache-9164b0c08bb4f7d7.arrow
[2023-05-18 11:50:58,074] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-05-18 11:50:58,084] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.1, git-hash=unknown, git-branch=unknown
[2023-05-18 11:50:58,086] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.1, git-hash=unknown, git-branch=unknown
[2023-05-18 11:50:58,086] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-05-18 11:50:58,087] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-05-18 11:50:58,087] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.1, git-hash=unknown, git-branch=unknown
[2023-05-18 11:50:58,088] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-05-18 11:50:58,088] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-05-18 11:50:58,089] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-05-18 11:50:58,089] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
AutoTP: [(<class 'transformers.models.opt.modeling_opt.OPTDecoderLayer'>, ['self_attn.out_proj', '.fc2'])]
AutoTP: [(<class 'transformers.models.opt.modeling_opt.OPTDecoderLayer'>, ['.fc2', 'self_attn.out_proj'])]
AutoTP: [(<class 'transformers.models.opt.modeling_opt.OPTDecoderLayer'>, ['.fc2', 'self_attn.out_proj'])]
AutoTP: [(<class 'transformers.models.opt.modeling_opt.OPTDecoderLayer'>, ['self_attn.out_proj', '.fc2'])]
which suggests it was running on quantize_bits=8?
how should i disable it? cause it drops a lot of accuracy for model OPT, gpt-neo which i grabbed from hugging face.
Beta Was this translation helpful? Give feedback.
All reactions