Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Codegen] Keep lowering config when decomposing linalg.pack #20311

Merged
merged 1 commit into from
Mar 20, 2025

Conversation

Max191
Copy link
Contributor

@Max191 Max191 commented Mar 19, 2025

When decomposing linalg.pack ops into tensor.pad->tensor.expand_shape->linalg.transpose, apply the lowering_config of the linalg.pack onto the linalg.transpose. This allows further tiling within the inner tiles of the pack after decomposition by setting lowering configs that have tile sizes for the inner tile dimensions of the operation.

@Max191 Max191 requested a review from hanhanW as a code owner March 19, 2025 16:25
@Max191 Max191 force-pushed the pack-decomposition-persistent-config branch from 4762d88 to e8ce777 Compare March 19, 2025 16:37
@hanhanW
Copy link
Contributor

hanhanW commented Mar 19, 2025

The iteration space for tensor.pack matches that of the produced linalg.transpose

This is not true to me because the iteration space of a pack op is about non-packing domain. I.e., the number of iteration dimension is as the same as the rank of the source. It might be true on GPU path, but it is definitely not true on all the CPU targets. I've been working hard to get rid of the pass on CPU codegen and the transpose op is a compute op to me, so it might be fine. Please also update tensor.pack to linalg.pack because the pack/unpack ops are moved to tensor dialect now.

https://github.com/llvm/llvm-project/blob/e1f4daf836e24d9c39fdd4fda84c01e4af31fd65/mlir/lib/Dialect/Linalg/Transforms/TilingInterfaceImpl.cpp#L569-L588

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on first glance, looked right to me. But will wait for discussion to resolve.

@Max191
Copy link
Contributor Author

Max191 commented Mar 19, 2025

This is not true to me because the iteration space of a pack op is about non-packing domain. I.e., the number of iteration dimension is as the same as the rank of the source.

I see, thanks for pointing it out. I didn't realize the iteration space rank was matching that of the source. I do still think the logic holds though, because the iteration domain bounds are still taken from the result shape of the pack, which matches the transpose bounds. Afaik, the TilingInterface will automatically truncate or fill with zeros to the correct iteration domain. Any tiling configuration that was valid for the pack should also be valid for the transpose (but not the other way around), so I think it is okay to pass the lowering config to the transpose op.

It also allows us to tile the inner tiles of the transpose after decomposition without having to pick a new config, so it is helpful to be able to keep the config. We actually do something similar already with IGEMM, where we set the lowering config based on the matmul that is expected to come out of the ConvToIGEMM transformation.

It is possibly a little brittle to be setting lowering configs with the expectation that they should apply to the decomposed op, but we control pack and unpack decomposition fairly well (only happening in this one pass), so I think it is okay. The alternative solution would be to set an iree_gpu.derived_thread_config on the transpose after/during decomposition, but that honestly feels more fragile to me.

EDIT: I think this is where tile sizes get resized to match the iteration space rank:
https://github.com/llvm/llvm-project/blob/6b9716b7f48a773ec7e73556245addf9aa25fad2/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp#L142

EDIT 2: My second PR: #20312 takes advantage of setting tile sizes on the inner tiles, as an example.

@Max191
Copy link
Contributor Author

Max191 commented Mar 19, 2025

Please also update tensor.pack to linalg.pack because the pack/unpack ops are moved to tensor dialect now.

It's taking me a while to get used to this renaming lol, I got so used to tensor.pack :p

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also allows us to tile the inner tiles of the transpose after decomposition without having to pick a new config, so it is helpful to be able to keep the config.

Good point to the tiling sizes for inner tiles. I can see the value because we are swizzling inner tiles on GPU.

The alternative solution would be to set an iree_gpu.derived_thread_config on the transpose after/during decomposition, but that honestly feels more fragile to me.

I don't like iree_gpu.derived_thread_config approach because it hides the strategy. I can't see the tile sizes from IR while it should be simple in this case.

@MaheshRavishankar MaheshRavishankar dismissed their stale review March 20, 2025 18:26

Unblocking, will come back to it later.

@hanhanW
Copy link
Contributor

hanhanW commented Mar 20, 2025

The iteration space for linalg.pack matches that of the produced linalg.transpose, so the lowering config will be valid for transpose op, and can be used for thread level tiling after decomposition.

@Max191 please update the PR description because the iteration space statement is not correct. I think our new understanding is that it allows us to describe tiling spec of inner tiles for further lowering.

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
@Max191 Max191 force-pushed the pack-decomposition-persistent-config branch from e8ce777 to df969ff Compare March 20, 2025 19:24
@Max191 Max191 merged commit 43753b3 into iree-org:main Mar 20, 2025
59 of 63 checks passed
@Max191 Max191 deleted the pack-decomposition-persistent-config branch March 20, 2025 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants