Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DT] Add parallel generic op materialization pattern for GPU #20316

Merged
merged 1 commit into from
Mar 26, 2025

Conversation

jtuyls
Copy link
Contributor

@jtuyls jtuyls commented Mar 19, 2025

Adds support for materializing elementwise generic ops patterns with layouts that have a tile swizzle expansion + transposition, as is the case for GPU.

Resolves: #20121

Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work so far! Here's the first round of comments.

Comment on lines 346 to 358
// Calculate the final packed result dimensions through the inverse result
// dimensions permutation vector. This effectively linearizes the packed
// result dimensions with respect to the output dimensions.
SmallVector<int64_t> finalPackedResultDims = llvm::map_to_vector(
packedResultDims, [&](int64_t r) { return invOutResultDimsPerm[r]; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow what this does (I thought the packedResultDims would already have the correct map dims). Could you show an example illustrating what this is doing?

Copy link
Contributor Author

@jtuyls jtuyls Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packedResultDims contains the permuted dimensions, but we need the permuted dimensions with respect to the output identity map. For example, if the permuted output dimensions are [D0, D2, D1], this will transform all packed operand result dimensions with the permutation map that would make the output dimensions the identity map [D0, D1, D2], i.e.

{
  D0 -> D0
  D1 -> D2
  D2 -> D1
}

Suppose that the operand dimensions are [D0, D2], this operation would transform it into [D0, D1] to align with the output identity map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what this is doing now, but a more concrete example showing the input/output operands with their respective swizzles would be quite helpful. See my comment below about this.

@Max191
Copy link
Contributor

Max191 commented Mar 20, 2025

Also, please update the PR title and description because this logic supports fully parallel linalg.generic ops, which includes more than just elementwise.

@jtuyls jtuyls force-pushed the elemwise-materialization branch from 7b4d851 to eb8c1ed Compare March 20, 2025 21:54
@jtuyls jtuyls changed the title [DT] Add elementwise generic op materialization pattern for GPU [DT] Add parallel generic op materialization pattern for GPU Mar 20, 2025
Comment on lines 336 to 358
// In case of a layout with swizzle, the packed result dimensions need
// to be transposed according to the swizzle's permutation vector.
if (materializeEncodingInfo.swizzle.has_value()) {
int inRank =
cast<RankedTensorType>(inputOperand->get().getType()).getRank();
SmallVector<int64_t> transposePerm =
llvm::to_vector(llvm::seq<int64_t>(0, inRank));
for (auto perm : materializeEncodingInfo.swizzle->permutation) {
transposePerm.push_back(inRank + perm);
}
applyPermutationToVector(packedResultDims, transposePerm);
}
// Calculate the final packed result dimensions through the inverse result
// dimensions permutation map. This effectively linearizes the packed result
// dimensions with respect to the output dimensions. For example, if the
// permuted output dimensions are [D0, D2, D1], this will transform all
// packed operand result dimensions with the permutation map that would make
// the output dimensions the identity map [D0, D1, D2], i.e. {D0 -> D0, D1
// -> D2, D2 -> D1}. Suppose that the operand dimensions are [D0, D2], this
// operation would transform it into [D0, D1] to align with the output
// identity map.
SmallVector<int64_t> finalPackedResultDims = llvm::map_to_vector(
packedResultDims, [&](int64_t r) { return invOutResultDimsPerm[r]; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this more, I think I understand what this is doing. I'll explain my understanding with an example:

input_type = tensor<64xi8>
output_type = tensor<64x128xi8>
intput_pack_info = {outer_dims_perm = [0] inner_dims_pos = [0] inner_tiles = [16]}
output_pack_info = {outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [16, 64]}
input_swizzle = {expand_shape = [[2, 8]], permutation = [1, 0]}
output_swizzle = {expand_shape = [[2, 8], [4, 16]], permutation = [1, 3, 0, 2]}
materialized_intput_type = tensor<4x8x2xi8>
materialized_output_type = tensor<2x4x8x16x2x4xi8>

The invOutResultDimsPerm contains the inverse of outResultDimsPerm, which is the swizzle permutation, but extended to the full rank of the converted DPS init tensor. So, that is to say, invOutResultDimsPerm is a mapping from the dimensions of the materialized type before the swizzle permutation to corresponding dims after the swizzle permutation.

invOutResultDimsPerm = [0, 1, 4, 2, 5, 3] // (inverse of swizzle perm is [2, 0, 3, 1])

The packedResultDims contains the corresponding dims for the input operand after expanding, but before permuting by the swizzle.permutation.

packedResultDims = [0, 2, 3]

Then we permute by the swizzle.permutation and get

packedResultDims = [0, 3, 2]

Now we have the mapping of the materialized input operand dims (after permuting by the input operand swizzle.permutation) to the corresponding dims in the output operand before applying its swizzle.permutation. The last step is to map this to the corresponding dims in the output operand after applying its swizzle.permutation. invOutResultDimsPerm has exactly that permutation, so we apply the mapping and get:

finalPackedResultDims = [0, 2, 4]

I think an example that walks through each step of the logic would be very helpful here. Composing many expand_shapes and permutations can make the logic very complex, and it took me a while to fully work out the logic and understand how it works. Having good comments/examples is really useful, and it will save time for anyone who needs to understand the code in the future. Could you add some comments at each step of the logic showing how we arrive at the final result (feel free to use the example I shared above)?

Copy link
Contributor Author

@jtuyls jtuyls Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Max191 I added additional documentation and an example. Could you have another look?

The invOutResultDimsPerm contains the inverse of outResultDimsPerm, which is the swizzle permutation, but extended to the full rank of the converted DPS init tensor. So, that is to say, invOutResultDimsPerm is a mapping from the dimensions of the materialized type before the swizzle permutation to corresponding dims after the swizzle permutation.

I think it's the other way around. invOutResultDimsPerm is a mapping from the dimensions of the materialized type after the swizzle permutation to corresponding dims before the swizzle permutation (identity map).

Now we have the mapping of the materialized input operand dims (after permuting by the input operand swizzle.permutation) to the corresponding dims in the output operand before applying its swizzle.permutation. The last step is to map this to the corresponding dims in the output operand after applying its swizzle.permutation. invOutResultDimsPerm has exactly that permutation, so we apply the mapping and get:

The easiest way to look at it for me is that the output mapping will need to become an identity mapping. After swizzling, we have the mapping of the materialized input and output operand, for example:

swizzledOutputDims: [0, 1, 2, 4, 7, 3, 5, 6]
swizzledInputDims: [0, 2, 7, 6]

Now, because this output mapping is converted to the identity mapping as required/assumed by the function, all swizzled input operands need to be transformed in the same way. The invOutSwizzlePerm permutation vector/dimension mapping can be used for this purpose:

invOutSwizzlePerm: [0, 1, 2, 5, 3, 6, 7, 4]

After applying this inverse permutation to the swizzledOutputDims, we get the identity map:

outputDims: [0, 1, 2, 3, 4, 5, 6, 7]

And we need to perform this same mapping for every input operand (that can potentially contain less dimensions). After applying this mapping to the swizzledInputDims above we get:

inputDims: [0, 2, 4, 7]

@jtuyls jtuyls force-pushed the elemwise-materialization branch from eb8c1ed to adcaec2 Compare March 24, 2025 16:49
@hanhanW hanhanW requested a review from Max191 March 24, 2025 17:48
Signed-off-by: Jorn Tuyls <jorn.tuyls@gmail.com>
@jtuyls jtuyls force-pushed the elemwise-materialization branch from adcaec2 to 43d5a50 Compare March 24, 2025 23:23
Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comments, LGTM now!

@jtuyls jtuyls merged commit 031cebd into iree-org:main Mar 26, 2025
60 of 62 checks passed
@jtuyls jtuyls deleted the elemwise-materialization branch March 26, 2025 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DT: Missing generic op materialization pattern for GPU data-tiling
3 participants