Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][ESIMD] Replace use of intrinsics with spirv functions for addc/subb #13093

Merged
merged 30 commits into from
Jan 7, 2025

Conversation

fineg74
Copy link
Contributor

@fineg74 fineg74 commented Mar 21, 2024

No description provided.

@@ -653,8 +653,6 @@ class ESIMDIntrinDescTable {
{"__spirv_ConvertFToBF16INTEL", {a(0)}}},
{"__devicelib_ConvertBF16ToFINTEL",
{"__spirv_ConvertBF16ToFINTEL", {a(0)}}},
{"addc", {"addc", {l(0)}}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep the lowering in LowerESIMD.cpp for a while - see the reasoning in other PR: #12935

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

addc(__ESIMD_NS::simd<uint32_t, N> &carry, __ESIMD_NS::simd<uint32_t, N> src0,
uint32_t src1) {
__ESIMD_NS::simd<uint32_t, N> Src1V = src1;
template <int N, typename T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the help of a macro would make it better: a) there will be no need for SFINAE and b) the templates would not require extra parameter T, which goes in very unusual order - after N - (usual order is <T, N>). Finally, passing simd_view as the 1 arg to vector+scalar op will require specifying only N, instead of <N, T>.
The idea is: (it may need some adaptation because of using SYCL_DEVICE_ONLY inside it:

#define __ESIMD_ADDC(T) \
template <int N> \
__ESIMD_API __ESIMD_NS::simd<T, N>> \
addc(__ESIMD_NS::simd<T, N> &carry, __ESIMD_NS::simd<T, N> src0, \
     __ESIMD_NS::simd<T, N> src1) { \
#ifdef __SYCL_DEVICE_ONLY__ \
  std::pair<__ESIMD_DNS::vector_type_t<T, N>, __ESIMD_DNS::vector_type_t<T, N>> \
      Result = __spirv_IAddCarry<T, N>(src0.data(), src1.data()); \
  carry = Result.second; \
  return Result.first; \
#else \
  return 0; \
#endif // __SYCL_DEVICE_ONLY__ \
} \
/* OTHER 3 variants here (vec+scalar, scalar+vec, sccalar+scalar  */ \

__ESIMD_ADDC(uint8_t)
__ESIMD_ADDC(uint16_t)
__ESIMD_ADDC(uint32_t)
__ESIMD_ADDC(uint64_t)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@v-klochkov v-klochkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add 64-bit case too - it was the main inspiration/reason for support of ADDC/SUBB via SPIRV.

@fineg74 fineg74 marked this pull request as ready for review December 20, 2024 21:11
@fineg74 fineg74 requested review from a team as code owners December 20, 2024 21:11
@fineg74
Copy link
Contributor Author

fineg74 commented Dec 20, 2024

Please add 64-bit case too - it was the main inspiration/reason for support of ADDC/SUBB via SPIRV.

Done

Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Pass &= test<T, 4, !AIsVector, BIsVector>(Q);

Pass &= test<T, 1, AIsVector, BIsVector>(Q);
Pass &= test<T, 1, AIsVector, !BIsVector>(Q);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this working now because we bumped the driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly speaking don't remember most likely I just added test cases. New driver was for handling 64 bit data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i meant the entire PR not this code specifically, my bad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes entire PR waited for new driver. There was a delay since there was a bug discovered on GPU side after initial implementation. Once Windows driver get updated I will revisit it to see if it resolves the issue of test failures on Windows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thx

@fineg74
Copy link
Contributor Author

fineg74 commented Jan 7, 2025

@uditagarwal97, @intel/llvm-reviewers-runtime, Could you please review this PR

Copy link
Contributor

@uditagarwal97 uditagarwal97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCL changes LGTM.

@fineg74
Copy link
Contributor Author

fineg74 commented Jan 7, 2025

@intel/llvm-gatekeepers, I believe the PR is ready to be merged.

@sarnex sarnex merged commit 1603bbe into intel:sycl Jan 7, 2025
14 checks passed
@fineg74 fineg74 deleted the addc branch January 7, 2025 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants