-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DT] Add parallel generic op materialization pattern for GPU #20316
Conversation
9f04d2a
to
7b4d851
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work so far! Here's the first round of comments.
compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingPatterns.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingPatterns.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingPatterns.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingPatterns.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingPatterns.cpp
Outdated
Show resolved
Hide resolved
// Calculate the final packed result dimensions through the inverse result | ||
// dimensions permutation vector. This effectively linearizes the packed | ||
// result dimensions with respect to the output dimensions. | ||
SmallVector<int64_t> finalPackedResultDims = llvm::map_to_vector( | ||
packedResultDims, [&](int64_t r) { return invOutResultDimsPerm[r]; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow what this does (I thought the packedResultDims
would already have the correct map dims). Could you show an example illustrating what this is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packedResultDims
contains the permuted dimensions, but we need the permuted dimensions with respect to the output identity map. For example, if the permuted output dimensions are [D0, D2, D1]
, this will transform all packed operand result dimensions with the permutation map that would make the output dimensions the identity map [D0, D1, D2]
, i.e.
{
D0 -> D0
D1 -> D2
D2 -> D1
}
Suppose that the operand dimensions are [D0, D2], this operation would transform it into [D0, D1] to align with the output identity map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what this is doing now, but a more concrete example showing the input/output operands with their respective swizzles would be quite helpful. See my comment below about this.
compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/GPUTileSwizzleUtils.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/GPUTileSwizzleUtils.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/GPUTileSwizzleUtils.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/GPUTileSwizzleUtils.cpp
Outdated
Show resolved
Hide resolved
Also, please update the PR title and description because this logic supports fully parallel linalg.generic ops, which includes more than just elementwise. |
compiler/src/iree/compiler/Codegen/Common/test/gpu_materialize_encoding_gfx942.mlir
Outdated
Show resolved
Hide resolved
7b4d851
to
eb8c1ed
Compare
// In case of a layout with swizzle, the packed result dimensions need | ||
// to be transposed according to the swizzle's permutation vector. | ||
if (materializeEncodingInfo.swizzle.has_value()) { | ||
int inRank = | ||
cast<RankedTensorType>(inputOperand->get().getType()).getRank(); | ||
SmallVector<int64_t> transposePerm = | ||
llvm::to_vector(llvm::seq<int64_t>(0, inRank)); | ||
for (auto perm : materializeEncodingInfo.swizzle->permutation) { | ||
transposePerm.push_back(inRank + perm); | ||
} | ||
applyPermutationToVector(packedResultDims, transposePerm); | ||
} | ||
// Calculate the final packed result dimensions through the inverse result | ||
// dimensions permutation map. This effectively linearizes the packed result | ||
// dimensions with respect to the output dimensions. For example, if the | ||
// permuted output dimensions are [D0, D2, D1], this will transform all | ||
// packed operand result dimensions with the permutation map that would make | ||
// the output dimensions the identity map [D0, D1, D2], i.e. {D0 -> D0, D1 | ||
// -> D2, D2 -> D1}. Suppose that the operand dimensions are [D0, D2], this | ||
// operation would transform it into [D0, D1] to align with the output | ||
// identity map. | ||
SmallVector<int64_t> finalPackedResultDims = llvm::map_to_vector( | ||
packedResultDims, [&](int64_t r) { return invOutResultDimsPerm[r]; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this more, I think I understand what this is doing. I'll explain my understanding with an example:
input_type = tensor<64xi8>
output_type = tensor<64x128xi8>
intput_pack_info = {outer_dims_perm = [0] inner_dims_pos = [0] inner_tiles = [16]}
output_pack_info = {outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [16, 64]}
input_swizzle = {expand_shape = [[2, 8]], permutation = [1, 0]}
output_swizzle = {expand_shape = [[2, 8], [4, 16]], permutation = [1, 3, 0, 2]}
materialized_intput_type = tensor<4x8x2xi8>
materialized_output_type = tensor<2x4x8x16x2x4xi8>
The invOutResultDimsPerm
contains the inverse of outResultDimsPerm
, which is the swizzle permutation, but extended to the full rank of the converted DPS init tensor. So, that is to say, invOutResultDimsPerm
is a mapping from the dimensions of the materialized type before the swizzle permutation to corresponding dims after the swizzle permutation.
invOutResultDimsPerm = [0, 1, 4, 2, 5, 3] // (inverse of swizzle perm is [2, 0, 3, 1])
The packedResultDims
contains the corresponding dims for the input operand after expanding, but before permuting by the swizzle.permutation.
packedResultDims = [0, 2, 3]
Then we permute by the swizzle.permutation and get
packedResultDims = [0, 3, 2]
Now we have the mapping of the materialized input operand dims (after permuting by the input operand swizzle.permutation) to the corresponding dims in the output operand before applying its swizzle.permutation. The last step is to map this to the corresponding dims in the output operand after applying its swizzle.permutation. invOutResultDimsPerm
has exactly that permutation, so we apply the mapping and get:
finalPackedResultDims = [0, 2, 4]
I think an example that walks through each step of the logic would be very helpful here. Composing many expand_shapes and permutations can make the logic very complex, and it took me a while to fully work out the logic and understand how it works. Having good comments/examples is really useful, and it will save time for anyone who needs to understand the code in the future. Could you add some comments at each step of the logic showing how we arrive at the final result (feel free to use the example I shared above)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Max191 I added additional documentation and an example. Could you have another look?
The invOutResultDimsPerm contains the inverse of outResultDimsPerm, which is the swizzle permutation, but extended to the full rank of the converted DPS init tensor. So, that is to say, invOutResultDimsPerm is a mapping from the dimensions of the materialized type before the swizzle permutation to corresponding dims after the swizzle permutation.
I think it's the other way around. invOutResultDimsPerm
is a mapping from the dimensions of the materialized type after the swizzle permutation to corresponding dims before the swizzle permutation (identity map).
Now we have the mapping of the materialized input operand dims (after permuting by the input operand swizzle.permutation) to the corresponding dims in the output operand before applying its swizzle.permutation. The last step is to map this to the corresponding dims in the output operand after applying its swizzle.permutation. invOutResultDimsPerm has exactly that permutation, so we apply the mapping and get:
The easiest way to look at it for me is that the output mapping will need to become an identity mapping. After swizzling, we have the mapping of the materialized input and output operand, for example:
swizzledOutputDims: [0, 1, 2, 4, 7, 3, 5, 6]
swizzledInputDims: [0, 2, 7, 6]
Now, because this output mapping is converted to the identity mapping as required/assumed by the function, all swizzled input operands need to be transformed in the same way. The invOutSwizzlePerm
permutation vector/dimension mapping can be used for this purpose:
invOutSwizzlePerm: [0, 1, 2, 5, 3, 6, 7, 4]
After applying this inverse permutation to the swizzledOutputDims
, we get the identity map:
outputDims: [0, 1, 2, 3, 4, 5, 6, 7]
And we need to perform this same mapping for every input operand (that can potentially contain less dimensions). After applying this mapping to the swizzledInputDims
above we get:
inputDims: [0, 2, 4, 7]
eb8c1ed
to
adcaec2
Compare
Signed-off-by: Jorn Tuyls <jorn.tuyls@gmail.com>
adcaec2
to
43d5a50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed comments, LGTM now!
Adds support for materializing elementwise generic ops patterns with layouts that have a tile swizzle expansion + transposition, as is the case for GPU.
Resolves: #20121