[SYCL][ESIMD] Fix lsc_load_2d API issue that prevented usage for different types#12244
[SYCL][ESIMD] Fix lsc_load_2d API issue that prevented usage for different types#12244v-klochkov merged 5 commits intointel:syclfrom
Conversation
| /// getNextPowerOf2(BlockWidth) * NBlocks | ||
| /// | ||
| template <typename T, int BlockWidth, int BlockHeight = 1, int NBlocks = 1, | ||
| template <typename Tx, int BlockWidth, int BlockHeight = 1, int NBlocks = 1, |
There was a problem hiding this comment.
Tx is not documented in this comment description.
Rather than renaming T to Tx, can you please keep T for user's type,
and add the using below as RawT?
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
| // REQUIRES: gpu-intel-pvc |
There was a problem hiding this comment.
We are in process of adding DG2 tests. I don't remember if load_2d supported by DG2. If Yes, then lease add one it for DG2. Perhaps this test will simply work for both DG2 and PVC
There was a problem hiding this comment.
Tried to run it on dg2 and it gets stuck. There is probably another version of load_2d for dg2 or other platforms but this one appears to be for PVC only
There was a problem hiding this comment.
The untyped load/store/prefetch 2d block operations are not supported by DG2/MTL/ARL. They are only available for PVC and Xe2+.
| constexpr int DstBlockElements = GRFColSize * GRFRowSize; | ||
| constexpr int DstElements = DstBlockElements * NBlocks; | ||
|
|
||
| constexpr uint32_t DstLength = DstBlockElements * sizeof(T) / 32; |
There was a problem hiding this comment.
@vmustya - can you please take a look at this block of code and the initialization of 'desc' at L2463 ?
There was a problem hiding this comment.
According to the hardware spec, dest size is encoded in units of registers.
constexpr GrfBytes = 64; /// for PVC&Xe2+
constexpr auto DstBlockElements = GrfColSize * GrfRowSize;
constexpr auto DstBlockSize = align(DstBlockElements * sizeof(T), GrfBytes); /// each block is register-aligned, there may be cross-block padding present.
constexpr auto DstElements = std::min(31, DstBlockSize / GrfBytes); /// Dst length of 32 is also encoded as 31.
There is a different API that uses VC intrinsics that does exactly that. We were asked for a new load/store 2d API since the old load/store_2d API uses unnecessary mov instructions (I believe the issue is that vc intrisic gets most of its parameters as function parameters rather than template parameters, generating mov instructions when building the descriptor and the payload. The new API generates most of this data in compile time eliminating mov instructions) |
| constexpr uint32_t GrfBytes = 64; | ||
| constexpr uint32_t DstBlockSize = | ||
| detail::roundUpNextMultiple<DstBlockElements * sizeof(T), GrfBytes>(); | ||
| constexpr uint32_t DstLength = DstBlockSize > 31 ? 31 : DstBlockSize; |
There was a problem hiding this comment.
Shouldn't it be devided byGrfBytes?
constexpr uint32_t DstLength = (DstBlockSize > 31) ? 31 : (DstBlockSize / GrfBytes);If should, then how the tests worked without it?
There was a problem hiding this comment.
Fixed. I believe the tests worked because the value of DstLength became greater than 31 in this case per documentation the HW is able to correctly determine the output size based on other parameters
No description provided.