Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tensor] Int4QTensor with quantized 4-bit integer data type #2895

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

djeong20
Copy link
Contributor

This pull request presents the class, a powerful solution for efficiently storing quantized 4-bit integer data. By packing each 4-bit integer into an 8-bit memory space, we utilize memory resources effectively—the first four bits hold the first 4-bit value, and the last four bits hold the second.

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Copy link
Member

@skykongkong8 skykongkong8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to have many negatigve unittest TCs as well. All good!

Comment on lines 233 to 248
/// @todo this func should be template function
void Int4QTensor::addValue(unsigned int b, unsigned int c, unsigned int h,
unsigned int w, float value, float beta) {
auto const &idx = getIndex(b, c, h, w);
float output = getValue(idx);
output *= beta;
output += value;

// if result value is out of range, clamp to max/min value
int8_t val = std::trunc(std::clamp((int)output, -8, 7));

// encode result value to int8 data
((int8_t *)getData())[idx / 2] =
(idx % 2 == 0) ? (val << 4) | (((int8_t *)getData())[idx / 2] & 0x0f)
: (((int8_t *)getData())[idx / 2] << 4) | (val & 0x0f);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question:
Do we just expect the user to consider scale factor in input float value and float beta?
I am curious about how basic math in int4Q tensor goes...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking! Currently, no. This is to modify the quantized value directly.

// encode result value to int8 data
((int8_t *)getData())[idx / 2] =
(idx % 2 == 0) ? (val << 4) | (((int8_t *)getData())[idx / 2] & 0x0f)
: (((int8_t *)getData())[idx / 2] << 4) | (val & 0x0f);
Copy link
Contributor

@EunjuYang EunjuYang Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understood, the computation should be :

Suggested change
: (((int8_t *)getData())[idx / 2] << 4) | (val & 0x0f);
: (((int8_t *)getData())[idx / 2] & 0xf0) | (val & 0x0f);

I'm quite confused with it. Please let me know If I'm wrong :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right! thanks for pointing it out :)

Comment on lines 260 to 261
(idx % 2 == 0) ? (val << 4) | ((int8_t *)getData())[idx / 2]
: ((int8_t *)getData())[idx / 2] | (val & 0x0f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find we need to clear out the space we want to append the value.

Suggested change
(idx % 2 == 0) ? (val << 4) | ((int8_t *)getData())[idx / 2]
: ((int8_t *)getData())[idx / 2] | (val & 0x0f);
(idx % 2 == 0) ? (val << 4) | (((int8_t *)getData())[idx / 2] & 0x0f)
: (((int8_t *)getData())[idx / 2] & 0xf0) | (val & 0x0f);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense 👍

This pull request presents the  class, a powerful solution for efficiently storing quantized 4-bit integer data.
By packing each 4-bit integer into an 8-bit memory space, we utilize memory resources effectively—where the first four bits hold the first 4-bit value and the last four bits hold the second.

1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants