-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quantizer] PerTensorAffineQuantizer operations #2828
[Quantizer] PerTensorAffineQuantizer operations #2828
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really appreciate your current initiative work for new quantizer class in the nntrainer!
Additionally, I want to come up with a little bit of idea to share-
It is possible that in the future, a quantization scheme such as follows may be required:
// Types.hpp from mllm
...
typedef struct {
float d; // delta
int8_t qs[QK_K]; // quants
int16_t bsums[QK_K / 16]; // sum of quants in groups of 16
} block_q8_K;
This is because at the computational kernel level, having such additional information helps to reduce its cost a lot!
Although this is not an urgent thing to consider, but I just wanted to inform you about this to avoid potential workload in the future.
16764fc
to
0b49cd0
Compare
Thank you for sharing the information! There are various quantization parameters for different quantization schemes. |
d104aac
to
400105e
Compare
Yes, I mean even with the same current quantization algorithm (uniform quantization, etc) qParam might be better to hold such additional information. Of course, there may be cases where this is not necessary, but at least in the case of matrix multiplication operations, it can be more effective to have such information for algorithm implementation. In a nutshell, even current algorithms sometimes require additional information like that, and sometimes they don't, so considering that there might be separate modifications in the future... It's just to inform you to prevent additional fixes in the future 😅 |
This PR adds initial PerTensorAffineQuantizer operation implementations. This change allows users to quantize and dequantize tensors. Note that the current implementation is naive and has limited features. The optimized version will be introduced in the later PR. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
400105e
to
9332b41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Work!!!
LGTM!
* @param val value to clip | ||
* @param lower lower bound | ||
* @param upper upper bound | ||
* @return T cliped data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This PR adds initial PerTensorAffineQuantizer operation implementations.
This change allows users to quantize and dequantize tensors.
Note that the current implementation is naive and has limited features.
The optimized version will be introduced in the later PR.
Self-evaluation: