-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][ESIMD] Fix lsc_load_2d API issue that prevented usage for different types #12244
Conversation
@@ -1984,14 +1999,15 @@ constexpr void check_lsc_block_2d_restrictions() { | |||
/// N = roundUpNextMultiple(BlockHeight, 4 / sizeof(T)) * | |||
/// getNextPowerOf2(BlockWidth) * NBlocks | |||
/// | |||
template <typename T, int BlockWidth, int BlockHeight = 1, int NBlocks = 1, | |||
template <typename Tx, int BlockWidth, int BlockHeight = 1, int NBlocks = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tx
is not documented in this comment description.
Rather than renaming T
to Tx
, can you please keep T
for user's type,
and add the using below as RawT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
// REQUIRES: gpu-intel-pvc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are in process of adding DG2 tests. I don't remember if load_2d supported by DG2. If Yes, then lease add one it for DG2. Perhaps this test will simply work for both DG2 and PVC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to run it on dg2 and it gets stuck. There is probably another version of load_2d for dg2 or other platforms but this one appears to be for PVC only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The untyped load/store/prefetch 2d block operations are not supported by DG2/MTL/ARL. They are only available for PVC and Xe2+.
@@ -2428,17 +2445,23 @@ ESIMD_INLINE SYCL_ESIMD_FUNCTION __ESIMD_NS::simd<T, N> lsc_load_2d( | |||
constexpr int DstBlockElements = GRFColSize * GRFRowSize; | |||
constexpr int DstElements = DstBlockElements * NBlocks; | |||
|
|||
constexpr uint32_t DstLength = DstBlockElements * sizeof(T) / 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vmustya - can you please take a look at this block of code and the initialization of 'desc' at L2463 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the hardware spec, dest size is encoded in units of registers.
constexpr GrfBytes = 64; /// for PVC&Xe2+
constexpr auto DstBlockElements = GrfColSize * GrfRowSize;
constexpr auto DstBlockSize = align(DstBlockElements * sizeof(T), GrfBytes); /// each block is register-aligned, there may be cross-block padding present.
constexpr auto DstElements = std::min(31, DstBlockSize / GrfBytes); /// Dst length of 32 is also encoded as 31.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, that ESIMD should emit just 2d block load/store/prefetch intrinsics and let VC lower them.
There is a different API that uses VC intrinsics that does exactly that. We were asked for a new load/store 2d API since the old load/store_2d API uses unnecessary mov instructions (I believe the issue is that vc intrisic gets most of its parameters as function parameters rather than template parameters, generating mov instructions when building the descriptor and the payload. The new API generates most of this data in compile time eliminating mov instructions) |
constexpr uint32_t GrfBytes = 64; | ||
constexpr uint32_t DstBlockSize = | ||
detail::roundUpNextMultiple<DstBlockElements * sizeof(T), GrfBytes>(); | ||
constexpr uint32_t DstLength = DstBlockSize > 31 ? 31 : DstBlockSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be devided byGrfBytes?
constexpr uint32_t DstLength = (DstBlockSize > 31) ? 31 : (DstBlockSize / GrfBytes);
If should, then how the tests worked without it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I believe the tests worked because the value of DstLength became greater than 31 in this case per documentation the HW is able to correctly determine the output size based on other parameters
No description provided.