Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][ESIMD] Fix lsc_load_2d API issue that prevented usage for different types #12244

Merged
merged 5 commits into from
Jan 2, 2024

Conversation

fineg74
Copy link
Contributor

@fineg74 fineg74 commented Dec 25, 2023

No description provided.

@fineg74 fineg74 requested a review from a team as a code owner December 25, 2023 00:15
@@ -1984,14 +1999,15 @@ constexpr void check_lsc_block_2d_restrictions() {
/// N = roundUpNextMultiple(BlockHeight, 4 / sizeof(T)) *
/// getNextPowerOf2(BlockWidth) * NBlocks
///
template <typename T, int BlockWidth, int BlockHeight = 1, int NBlocks = 1,
template <typename Tx, int BlockWidth, int BlockHeight = 1, int NBlocks = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tx is not documented in this comment description.
Rather than renaming T to Tx, can you please keep T for user's type,
and add the using below as RawT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
// REQUIRES: gpu-intel-pvc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in process of adding DG2 tests. I don't remember if load_2d supported by DG2. If Yes, then lease add one it for DG2. Perhaps this test will simply work for both DG2 and PVC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to run it on dg2 and it gets stuck. There is probably another version of load_2d for dg2 or other platforms but this one appears to be for PVC only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The untyped load/store/prefetch 2d block operations are not supported by DG2/MTL/ARL. They are only available for PVC and Xe2+.

@@ -2428,17 +2445,23 @@ ESIMD_INLINE SYCL_ESIMD_FUNCTION __ESIMD_NS::simd<T, N> lsc_load_2d(
constexpr int DstBlockElements = GRFColSize * GRFRowSize;
constexpr int DstElements = DstBlockElements * NBlocks;

constexpr uint32_t DstLength = DstBlockElements * sizeof(T) / 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmustya - can you please take a look at this block of code and the initialization of 'desc' at L2463 ?

Copy link
Contributor

@vmustya vmustya Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the hardware spec, dest size is encoded in units of registers.

constexpr GrfBytes = 64; /// for PVC&Xe2+ 
constexpr auto DstBlockElements = GrfColSize * GrfRowSize;
constexpr auto DstBlockSize = align(DstBlockElements * sizeof(T), GrfBytes); /// each block is register-aligned, there may be cross-block padding present.

constexpr auto DstElements = std::min(31, DstBlockSize / GrfBytes); /// Dst length of 32 is also encoded as 31.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

@vmustya vmustya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, that ESIMD should emit just 2d block load/store/prefetch intrinsics and let VC lower them.

@fineg74
Copy link
Contributor Author

fineg74 commented Dec 26, 2023

I believe, that ESIMD should emit just 2d block load/store/prefetch intrinsics and let VC lower them.

There is a different API that uses VC intrinsics that does exactly that. We were asked for a new load/store 2d API since the old load/store_2d API uses unnecessary mov instructions (I believe the issue is that vc intrisic gets most of its parameters as function parameters rather than template parameters, generating mov instructions when building the descriptor and the payload. The new API generates most of this data in compile time eliminating mov instructions)

constexpr uint32_t GrfBytes = 64;
constexpr uint32_t DstBlockSize =
detail::roundUpNextMultiple<DstBlockElements * sizeof(T), GrfBytes>();
constexpr uint32_t DstLength = DstBlockSize > 31 ? 31 : DstBlockSize;
Copy link
Contributor

@v-klochkov v-klochkov Dec 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be devided byGrfBytes?

constexpr uint32_t DstLength = (DstBlockSize > 31) ? 31 : (DstBlockSize / GrfBytes);

If should, then how the tests worked without it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I believe the tests worked because the value of DstLength became greater than 31 in this case per documentation the HW is able to correctly determine the output size based on other parameters

@v-klochkov v-klochkov merged commit 0913045 into intel:sycl Jan 2, 2024
13 checks passed
@fineg74 fineg74 deleted the lsc2dFix branch January 2, 2024 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants