Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quantizer] PerTensorAffineQuantizer operations #2828

Merged

Conversation

djeong20
Copy link
Contributor

This PR adds initial PerTensorAffineQuantizer operation implementations.
This change allows users to quantize and dequantize tensors.
Note that the current implementation is naive and has limited features.
The optimized version will be introduced in the later PR.

Self-evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Copy link
Member

@skykongkong8 skykongkong8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate your current initiative work for new quantizer class in the nntrainer!
Additionally, I want to come up with a little bit of idea to share-
It is possible that in the future, a quantization scheme such as follows may be required:

// Types.hpp from mllm
...
typedef struct {
    float d;                  // delta
    int8_t qs[QK_K];          // quants
    int16_t bsums[QK_K / 16]; // sum of quants in groups of 16
} block_q8_K;

This is because at the computational kernel level, having such additional information helps to reduce its cost a lot!
Although this is not an urgent thing to consider, but I just wanted to inform you about this to avoid potential workload in the future.

@djeong20 djeong20 force-pushed the quantizer/per_tensor_affine/impl_v1 branch from 16764fc to 0b49cd0 Compare December 16, 2024 08:12
@djeong20
Copy link
Contributor Author

I really appreciate your current initiative work for new quantizer class in the nntrainer! Additionally, I want to come up with a little bit of idea to share- It is possible that in the future, a quantization scheme such as follows may be required:

// Types.hpp from mllm
...
typedef struct {
    float d;                  // delta
    int8_t qs[QK_K];          // quants
    int16_t bsums[QK_K / 16]; // sum of quants in groups of 16
} block_q8_K;

This is because at the computational kernel level, having such additional information helps to reduce its cost a lot! Although this is not an urgent thing to consider, but I just wanted to inform you about this to avoid potential workload in the future.

Thank you for sharing the information! There are various quantization parameters for different quantization schemes.
I wanna make it clear that we are on the same page!
Do you mean the current quantization schemes can also have the parameters (delta, quants, sum of quants) or there is another quantization scheme that has such q params?

@djeong20 djeong20 force-pushed the quantizer/per_tensor_affine/impl_v1 branch 2 times, most recently from d104aac to 400105e Compare December 18, 2024 00:53
@skykongkong8
Copy link
Member

Thank you for sharing the information! There are various quantization parameters for different quantization schemes. I wanna make it clear that we are on the same page! Do you mean the current quantization schemes can also have the parameters (delta, quants, sum of quants) or there is another quantization scheme that has such q params?

Yes, I mean even with the same current quantization algorithm (uniform quantization, etc) qParam might be better to hold such additional information. Of course, there may be cases where this is not necessary, but at least in the case of matrix multiplication operations, it can be more effective to have such information for algorithm implementation. In a nutshell, even current algorithms sometimes require additional information like that, and sometimes they don't, so considering that there might be separate modifications in the future... It's just to inform you to prevent additional fixes in the future 😅

This PR adds initial PerTensorAffineQuantizer operation implementations.
This change allows users to quantize and dequantize tensors.
Note that the current implementation is naive and has limited features.
The optimized version will be introduced in the later PR.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
@djeong20 djeong20 force-pushed the quantizer/per_tensor_affine/impl_v1 branch from 400105e to 9332b41 Compare December 23, 2024 00:32
@djeong20 djeong20 changed the title [Wait for #2824][Quantizer] PerTensorAffineQuantizer operations [Quantizer] PerTensorAffineQuantizer operations Dec 23, 2024
Copy link
Member

@DonghakPark DonghakPark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Work!!!
LGTM!

* @param val value to clip
* @param lower lower bound
* @param upper upper bound
* @return T cliped data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clipped?

Copy link
Contributor

@baek2sm baek2sm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. LGTM!

Copy link
Contributor

@EunjuYang EunjuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@jijoongmoon jijoongmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jijoongmoon jijoongmoon merged commit db0ed5d into nnstreamer:main Jan 2, 2025
27 checks passed
@djeong20 djeong20 deleted the quantizer/per_tensor_affine/impl_v1 branch January 9, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants