Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

env variable to select rounding mode #3515

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hhyuanf
Copy link

@hhyuanf hhyuanf commented Dec 19, 2024

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/595

Accuracy issue was reported by ads team, specifically when the intput tensor is large, some times we get inf relative difference. It happens because abs diff > expected diff and a non-zero value after quant and dequant becomes 0 (so divisor is 0), meaning the root cause is the abs diff is larger than expected. We can reproduce the problem with the following small size input, specifically -502.516 will become 0 after quant and dequant

-180.8454,276.3368,892.1324, 1101.1176, -502.5216,-302.0942,2268.5430,-5960.6919

ideally -502 should be -500. The reason it becomes 0 is that in mx4 quant, number is scaled down by 2^shared_exponent (of that group) and the value of shared_exponent is impacted by rounding method. If shared_exponent is (relatively) bigger, after scaling, many number become small so we lose a bunch of info. Out of all rounding, floor should give the smallest exponent, ceil probably gives the biggest, even and nearest hard to say since they can round up or down depending on the input but likely still be smaller than ceil,
stochastic tries to round down after adding some noise, so probably better or on par with even and nearest, worse than floor.

This is also verified by the unit tests. whe rounding is set to floor and stochastic, tests pass, otherwise fail

This diff enables selecting rounding mode through env variable. If a rounding method is provided through function call, it takes precedence otherwise it looks at env variable. Default is nearest

Differential Revision: D67425485

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67425485

Copy link

netlify bot commented Dec 19, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 4e4cc6e
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6765ae3a56b3260008e0aee3
😎 Deploy Preview https://deploy-preview-3515--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

hhyuanf pushed a commit to hhyuanf/FBGEMM-1 that referenced this pull request Dec 20, 2024
Summary:

X-link: facebookresearch/FBGEMM#595

Accuracy issue was reported by ads team, specifically when the intput tensor is large, some times we get inf relative difference. It happens because abs diff > expected diff and a non-zero value after quant and dequant becomes 0 (so divisor is 0), meaning the root cause is the abs diff is larger than expected. We can reproduce the problem with the following small size input, specifically -502.516 will become 0 after quant and dequant
```
-180.8454,276.3368,892.1324, 1101.1176, -502.5216,-302.0942,2268.5430,-5960.6919
```

ideally -502 should be -500. The reason it becomes 0 is that in mx4 quant, number is scaled down by 2^shared_exponent (of that group) and the value of shared_exponent is impacted by rounding method. If shared_exponent is (relatively) bigger, after scaling, many number become small so we lose a bunch of info. Out of all rounding, floor should give the smallest exponent, ceil probably gives the biggest, even and nearest hard to say since they can round up or down depending on the input but likely still be smaller than ceil,
stochastic tries to round down after adding some noise, so probably better or on par with even and nearest, worse than floor.

This is also verified by the unit tests. whe rounding is set to floor and stochastic, tests pass, otherwise fail

This diff enables selecting rounding mode through env variable. If a rounding method is provided through function call, it takes precedence otherwise it looks at env variable. Default is nearest

Differential Revision: D67425485
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67425485

hhyuanf pushed a commit to hhyuanf/FBGEMM-1 that referenced this pull request Dec 20, 2024
Summary:

X-link: facebookresearch/FBGEMM#595

Accuracy issue was reported by ads team, specifically when the intput tensor is large, some times we get inf relative difference. It happens because abs diff > expected diff and a non-zero value after quant and dequant becomes 0 (so divisor is 0), meaning the root cause is the abs diff is larger than expected. We can reproduce the problem with the following small size input, specifically -502.516 will become 0 after quant and dequant
```
-180.8454,276.3368,892.1324, 1101.1176, -502.5216,-302.0942,2268.5430,-5960.6919
```

ideally -502 should be -500. The reason it becomes 0 is that in mx4 quant, number is scaled down by 2^shared_exponent (of that group) and the value of shared_exponent is impacted by rounding method. If shared_exponent is (relatively) bigger, after scaling, many number become small so we lose a bunch of info. Out of all rounding, floor should give the smallest exponent, ceil probably gives the biggest, even and nearest hard to say since they can round up or down depending on the input but likely still be smaller than ceil,
stochastic tries to round down after adding some noise, so probably better or on par with even and nearest, worse than floor.

This is also verified by the unit tests. whe rounding is set to floor and stochastic, tests pass, otherwise fail

This diff enables selecting rounding mode through env variable. If a rounding method is provided through function call, it takes precedence otherwise it looks at env variable. Default is nearest

Differential Revision: D67425485
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67425485

hhyuanf pushed a commit to hhyuanf/FBGEMM-1 that referenced this pull request Dec 20, 2024
Summary:

X-link: facebookresearch/FBGEMM#595

Accuracy issue was reported by ads team, specifically when the intput tensor is large, some times we get inf relative difference. It happens because abs diff > expected diff and a non-zero value after quant and dequant becomes 0 (so divisor is 0), meaning the root cause is the abs diff is larger than expected. We can reproduce the problem with the following small size input, specifically -502.516 will become 0 after quant and dequant
```
-180.8454,276.3368,892.1324, 1101.1176, -502.5216,-302.0942,2268.5430,-5960.6919
```

ideally -502 should be -500. The reason it becomes 0 is that in mx4 quant, number is scaled down by 2^shared_exponent (of that group) and the value of shared_exponent is impacted by rounding method. If shared_exponent is (relatively) bigger, after scaling, many number become small so we lose a bunch of info. Out of all rounding, floor should give the smallest exponent, ceil probably gives the biggest, even and nearest hard to say since they can round up or down depending on the input but likely still be smaller than ceil,
stochastic tries to round down after adding some noise, so probably better or on par with even and nearest, worse than floor.

This is also verified by the unit tests. whe rounding is set to floor and stochastic, tests pass, otherwise fail

This diff enables selecting rounding mode through env variable. If a rounding method is provided through function call, it takes precedence otherwise it looks at env variable. Default is nearest

Differential Revision: D67425485
Summary:

X-link: facebookresearch/FBGEMM#595

Accuracy issue was reported by ads team, specifically when the intput tensor is large, some times we get inf relative difference. It happens because abs diff > expected diff and a non-zero value after quant and dequant becomes 0 (so divisor is 0), meaning the root cause is the abs diff is larger than expected. We can reproduce the problem with the following small size input, specifically -502.516 will become 0 after quant and dequant
```
-180.8454,276.3368,892.1324, 1101.1176, -502.5216,-302.0942,2268.5430,-5960.6919
```

ideally -502 should be -500. The reason it becomes 0 is that in mx4 quant, number is scaled down by 2^shared_exponent (of that group) and the value of shared_exponent is impacted by rounding method. If shared_exponent is (relatively) bigger, after scaling, many number become small so we lose a bunch of info. Out of all rounding, floor should give the smallest exponent, ceil probably gives the biggest, even and nearest hard to say since they can round up or down depending on the input but likely still be smaller than ceil,
stochastic tries to round down after adding some noise, so probably better or on par with even and nearest, worse than floor.

This is also verified by the unit tests. whe rounding is set to floor and stochastic, tests pass, otherwise fail

This diff enables selecting rounding mode through env variable. If a rounding method is provided through function call, it takes precedence otherwise it looks at env variable. Default is nearest

Differential Revision: D67425485
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67425485

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67425485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants