[DT] Add parallel generic op materialization pattern for GPU #20316

jtuyls · 2025-03-19T18:09:22Z

Adds support for materializing elementwise generic ops patterns with layouts that have a tile swizzle expansion + transposition, as is the case for GPU.

Resolves: #20121

Max191

Nice work so far! Here's the first round of comments.

compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingPatterns.cpp

Max191 · 2025-03-20T17:10:18Z

compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingPatterns.cpp

+    // Calculate the final packed result dimensions through the inverse result
+    // dimensions permutation vector. This effectively linearizes the packed
+    // result dimensions with respect to the output dimensions.
+    SmallVector<int64_t> finalPackedResultDims = llvm::map_to_vector(
+        packedResultDims, [&](int64_t r) { return invOutResultDimsPerm[r]; });


I don't follow what this does (I thought the packedResultDims would already have the correct map dims). Could you show an example illustrating what this is doing?

packedResultDims contains the permuted dimensions, but we need the permuted dimensions with respect to the output identity map. For example, if the permuted output dimensions are [D0, D2, D1], this will transform all packed operand result dimensions with the permutation map that would make the output dimensions the identity map [D0, D1, D2], i.e.

{ D0 -> D0 D1 -> D2 D2 -> D1 }

Suppose that the operand dimensions are [D0, D2], this operation would transform it into [D0, D1] to align with the output identity map.

I think I understand what this is doing now, but a more concrete example showing the input/output operands with their respective swizzles would be quite helpful. See my comment below about this.

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/GPUTileSwizzleUtils.cpp

Max191 · 2025-03-20T17:32:05Z

Also, please update the PR title and description because this logic supports fully parallel linalg.generic ops, which includes more than just elementwise.

compiler/src/iree/compiler/Codegen/Common/test/gpu_materialize_encoding_gfx942.mlir

Max191 · 2025-03-21T17:16:48Z

compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingPatterns.cpp

+    // In case of a layout with swizzle, the packed result dimensions need
+    // to be transposed according to the swizzle's permutation vector.
+    if (materializeEncodingInfo.swizzle.has_value()) {
+      int inRank =
+          cast<RankedTensorType>(inputOperand->get().getType()).getRank();
+      SmallVector<int64_t> transposePerm =
+          llvm::to_vector(llvm::seq<int64_t>(0, inRank));
+      for (auto perm : materializeEncodingInfo.swizzle->permutation) {
+        transposePerm.push_back(inRank + perm);
+      }
+      applyPermutationToVector(packedResultDims, transposePerm);
+    }
+    // Calculate the final packed result dimensions through the inverse result
+    // dimensions permutation map. This effectively linearizes the packed result
+    // dimensions with respect to the output dimensions. For example, if the
+    // permuted output dimensions are [D0, D2, D1], this will transform all
+    // packed operand result dimensions with the permutation map that would make
+    // the output dimensions the identity map [D0, D1, D2], i.e. {D0 -> D0, D1
+    // -> D2, D2 -> D1}. Suppose that the operand dimensions are [D0, D2], this
+    // operation would transform it into [D0, D1] to align with the output
+    // identity map.
+    SmallVector<int64_t> finalPackedResultDims = llvm::map_to_vector(
+        packedResultDims, [&](int64_t r) { return invOutResultDimsPerm[r]; });


After thinking about this more, I think I understand what this is doing. I'll explain my understanding with an example:

input_type = tensor<64xi8> output_type = tensor<64x128xi8> intput_pack_info = {outer_dims_perm = [0] inner_dims_pos = [0] inner_tiles = [16]} output_pack_info = {outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [16, 64]} input_swizzle = {expand_shape = [[2, 8]], permutation = [1, 0]} output_swizzle = {expand_shape = [[2, 8], [4, 16]], permutation = [1, 3, 0, 2]} materialized_intput_type = tensor<4x8x2xi8> materialized_output_type = tensor<2x4x8x16x2x4xi8>

The invOutResultDimsPerm contains the inverse of outResultDimsPerm, which is the swizzle permutation, but extended to the full rank of the converted DPS init tensor. So, that is to say, invOutResultDimsPerm is a mapping from the dimensions of the materialized type before the swizzle permutation to corresponding dims after the swizzle permutation.

invOutResultDimsPerm = [0, 1, 4, 2, 5, 3] // (inverse of swizzle perm is [2, 0, 3, 1])

The packedResultDims contains the corresponding dims for the input operand after expanding, but before permuting by the swizzle.permutation.

packedResultDims = [0, 2, 3]

Then we permute by the swizzle.permutation and get

packedResultDims = [0, 3, 2]

Now we have the mapping of the materialized input operand dims (after permuting by the input operand swizzle.permutation) to the corresponding dims in the output operand before applying its swizzle.permutation. The last step is to map this to the corresponding dims in the output operand after applying its swizzle.permutation. invOutResultDimsPerm has exactly that permutation, so we apply the mapping and get:

finalPackedResultDims = [0, 2, 4]

I think an example that walks through each step of the logic would be very helpful here. Composing many expand_shapes and permutations can make the logic very complex, and it took me a while to fully work out the logic and understand how it works. Having good comments/examples is really useful, and it will save time for anyone who needs to understand the code in the future. Could you add some comments at each step of the logic showing how we arrive at the final result (feel free to use the example I shared above)?

@Max191 I added additional documentation and an example. Could you have another look?

The invOutResultDimsPerm contains the inverse of outResultDimsPerm, which is the swizzle permutation, but extended to the full rank of the converted DPS init tensor. So, that is to say, invOutResultDimsPerm is a mapping from the dimensions of the materialized type before the swizzle permutation to corresponding dims after the swizzle permutation.

I think it's the other way around. invOutResultDimsPerm is a mapping from the dimensions of the materialized type after the swizzle permutation to corresponding dims before the swizzle permutation (identity map).

Now we have the mapping of the materialized input operand dims (after permuting by the input operand swizzle.permutation) to the corresponding dims in the output operand before applying its swizzle.permutation. The last step is to map this to the corresponding dims in the output operand after applying its swizzle.permutation. invOutResultDimsPerm has exactly that permutation, so we apply the mapping and get:

The easiest way to look at it for me is that the output mapping will need to become an identity mapping. After swizzling, we have the mapping of the materialized input and output operand, for example:

swizzledOutputDims: [0, 1, 2, 4, 7, 3, 5, 6] swizzledInputDims: [0, 2, 7, 6]

Now, because this output mapping is converted to the identity mapping as required/assumed by the function, all swizzled input operands need to be transformed in the same way. The invOutSwizzlePerm permutation vector/dimension mapping can be used for this purpose:

invOutSwizzlePerm: [0, 1, 2, 5, 3, 6, 7, 4]

After applying this inverse permutation to the swizzledOutputDims, we get the identity map:

outputDims: [0, 1, 2, 3, 4, 5, 6, 7]

And we need to perform this same mapping for every input operand (that can potentially contain less dimensions). After applying this mapping to the swizzledInputDims above we get:

inputDims: [0, 2, 4, 7]

Signed-off-by: Jorn Tuyls <jorn.tuyls@gmail.com>

Max191

Thanks for the detailed comments, LGTM now!

jtuyls requested review from antiagainst, qedawkins, hanhanW and MaheshRavishankar as code owners March 19, 2025 18:09

jtuyls force-pushed the elemwise-materialization branch from 9f04d2a to 7b4d851 Compare March 19, 2025 23:23

hanhanW requested a review from Max191 March 20, 2025 02:07

Max191 requested changes Mar 20, 2025

View reviewed changes

hanhanW reviewed Mar 20, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/test/gpu_materialize_encoding_gfx942.mlir Outdated Show resolved Hide resolved

jtuyls force-pushed the elemwise-materialization branch from 7b4d851 to eb8c1ed Compare March 20, 2025 21:54

jtuyls changed the title ~~[DT] Add elementwise generic op materialization pattern for GPU~~ [DT] Add parallel generic op materialization pattern for GPU Mar 20, 2025

Max191 reviewed Mar 21, 2025

View reviewed changes

jtuyls force-pushed the elemwise-materialization branch from eb8c1ed to adcaec2 Compare March 24, 2025 16:49

hanhanW requested a review from Max191 March 24, 2025 17:48

[DT] Add generic op materialization pattern for GPU

43d5a50

Signed-off-by: Jorn Tuyls <jorn.tuyls@gmail.com>

jtuyls force-pushed the elemwise-materialization branch from adcaec2 to 43d5a50 Compare March 24, 2025 23:23

Max191 approved these changes Mar 26, 2025

View reviewed changes

jtuyls merged commit 031cebd into iree-org:main Mar 26, 2025
60 of 62 checks passed

jtuyls deleted the elemwise-materialization branch March 26, 2025 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DT] Add parallel generic op materialization pattern for GPU #20316

[DT] Add parallel generic op materialization pattern for GPU #20316

jtuyls commented Mar 19, 2025 •

edited

Loading

Max191 left a comment

Max191 Mar 20, 2025

jtuyls Mar 20, 2025 •

edited

Loading

Max191 Mar 21, 2025

Max191 commented Mar 20, 2025

Max191 Mar 21, 2025

jtuyls Mar 24, 2025 •

edited

Loading

Max191 left a comment

[DT] Add parallel generic op materialization pattern for GPU #20316

[DT] Add parallel generic op materialization pattern for GPU #20316

Conversation

jtuyls commented Mar 19, 2025 • edited Loading

Max191 left a comment

Choose a reason for hiding this comment

Max191 Mar 20, 2025

Choose a reason for hiding this comment

jtuyls Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

Max191 Mar 21, 2025

Choose a reason for hiding this comment

Max191 commented Mar 20, 2025

Max191 Mar 21, 2025

Choose a reason for hiding this comment

jtuyls Mar 24, 2025 • edited Loading

Choose a reason for hiding this comment

Max191 left a comment

Choose a reason for hiding this comment

jtuyls commented Mar 19, 2025 •

edited

Loading

jtuyls Mar 20, 2025 •

edited

Loading

jtuyls Mar 24, 2025 •

edited

Loading