[ROCm] fixes ambiguous calls to shfl*
where there is no explicit type conversion from c10::Half
to __half
#324
Job | Run time |
---|---|
10m 8s | |
9m 46s | |
10m 17s | |
10m 32s | |
40m 43s |