Softmax log backward : Increase precision of fp16's accumulator to fp32 #3427

bghimireamd · 2024-12-06T14:39:36Z

The accumulator of fp16 Softmax log for backward was fp16 too, this caused precision issue in onnx's test. This PR increases the accumulator of fp16 Softmax log backward to fp32. I also initialized the y for softmax backward.

old code : fp16 += fp16*fp16
new code: fp32 += fp16*fp16

Performance of fp32 vs fp16:
./bin/MIOpenDriver softmaxfp16 -n 128 -c 1 -H 1 -W 1500 -F 2 -a 2 -m 0 -A 1.000000 -B 0.000000 -t 1
fp16 accumulator average of 4 runs : 0.013351 ms
fp32 accumulator average of 4 runs : 0.013316 ms

./bin/MIOpenDriver softmaxfp16 -n 128 -c 32 -H 150 -W 150 -F 2 -a 2 -m 0 -A 1.000000 -B 0.000000 -t 1
fp16 accumulator average of 4 runs : 2.794 ms
fp32 accumulator average of 4 runs : 2.783 ms

I did not see performance degradation after changing the accumulator to fp32.

Precision of fp32 vs fp16
fp16 Acuumulator
MIOpenDriver softmaxfp16 -n 128 -c 1 -H 1 -W 1500 -F 2 -a 2 -m 0 -A 1.000000 -B 0.000000 -t 1
PRNG seed: 12345678
GPU Kernel Time Backward Softmax Elapsed: 0.013422 ms
Backward Softmax Verifies on CPU and GPU (err=0.000235)

fp32 Acuumulator
MIOpenDriver softmaxfp16 -n 128 -c 1 -H 1 -W 1500 -F 2 -a 2 -m 0 -A 1.000000 -B 0.000000 -t 1
PRNG seed: 12345678
GPU Kernel Time Backward Softmax Elapsed: 0.013102 ms
Backward Softmax Verifies on CPU and GPU (err=0.000022)

precision fp16 vs fp32
0.000235 vs 0.000022

- make accumulator of fp16 as fp32 to increase the precision

BradPepersAMD · 2024-12-06T15:52:29Z

It doesn't look like this really is making much of a difference in the accuracy. Does the original ticket mention what accuracy is expected? One thing I see in the code is that it looks like LogAddExp is still always using half right now?

bghimireamd · 2024-12-06T16:49:01Z

It doesn't look like this really is making much of a difference in the accuracy. Does the original ticket mention what accuracy is expected? One thing I see in the code is that it looks like LogAddExp is still always using half right now?

Sorry I forgot to place unit. The numbers I mention is in ms. I did not change LogAddExp since it was just being used in forward case. The precision issue was seen in backward. After I ported my change on onnx's docker I was able to see the test pass.

The original ticket does mention tolerance. idx: 37504 expected 0.23584, got 0.288086, diff: 0.0522461, tol=0.021792

CAHEK7 · 2024-12-07T23:30:39Z

It doesn't look like this really is making much of a difference in the accuracy. Does the original ticket mention what accuracy is expected? One thing I see in the code is that it looks like LogAddExp is still always using half right now?

Since it accumulates the numbers, using FP16 as accumulator can be significantly affected by precision loss. Trying to accumulate 4096 times by 1 will result into 2048. FP16 starts ignoring relatively small values quite fast, a way much faster than FP32.

src/kernels/MIOpenSoftmax.cl

bpepers-me · 2024-12-08T05:16:52Z

With this ticket being about precision, can you post what the error was before and after this change so we can see how much the precision has improved?

bghimireamd · 2024-12-08T12:19:55Z

With this ticket being about precision, can you post what the error was before and after this change so we can see how much the precision has improved?

added precision numbers in PR.

…fp16_precision

- initialize y for softmax backward

f086165

- make accumulator of fp16 as fp32 to increase the precision

bghimireamd requested review from BrianHarrisonAMD, junliume and BradPepersAMD as code owners December 6, 2024 14:39

fix typo

df21f02

CAHEK7 reviewed Dec 7, 2024

View reviewed changes

src/kernels/MIOpenSoftmax.cl Outdated Show resolved Hide resolved

introduce _FLOAT_ACCUM

8680a49

bghimireamd added 2 commits December 9, 2024 07:43

move definition

6103106

fix local variable

9208e47

CAHEK7 approved these changes Dec 10, 2024

View reviewed changes

Merge branch 'develop' of github.com:ROCm/MIOpen into bg/softmax_log_…

bcf5ab5

…fp16_precision

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Softmax log backward : Increase precision of fp16's accumulator to fp32 #3427

Softmax log backward : Increase precision of fp16's accumulator to fp32 #3427

bghimireamd commented Dec 6, 2024 •

edited

Loading

BradPepersAMD commented Dec 6, 2024

bghimireamd commented Dec 6, 2024 •

edited

Loading

CAHEK7 commented Dec 7, 2024

bpepers-me commented Dec 8, 2024

bghimireamd commented Dec 8, 2024

Softmax log backward : Increase precision of fp16's accumulator to fp32 #3427

Are you sure you want to change the base?

Softmax log backward : Increase precision of fp16's accumulator to fp32 #3427

Conversation

bghimireamd commented Dec 6, 2024 • edited Loading

BradPepersAMD commented Dec 6, 2024

bghimireamd commented Dec 6, 2024 • edited Loading

CAHEK7 commented Dec 7, 2024

bpepers-me commented Dec 8, 2024

bghimireamd commented Dec 8, 2024

bghimireamd commented Dec 6, 2024 •

edited

Loading

bghimireamd commented Dec 6, 2024 •

edited

Loading