Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVMGPU] Support linalg.pack through LLVMGPUTileAndFuse #20312

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Max191
Copy link
Contributor

@Max191 Max191 commented Mar 19, 2025

Add lowering config selection logic for linalg.pack through LLVMGPUTileAndFuse, and remove the old LLVMGPUPackUnPack pipeline. Pack lowering configs are set similarly to other linalg ops going through TileAndFuse, because they are effectively just a linalg.transpose with a padded input.

fixes #20212

Max191 added 2 commits March 20, 2025 15:45
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
@Max191 Max191 force-pushed the pack-tile-and-fuse branch from d86123a to 889c820 Compare March 20, 2025 20:45
@Max191 Max191 marked this pull request as ready for review March 20, 2025 20:49
Operation *op) {
static bool elementHasPowerOfTwoBitwidth(Value operand) {
Type elementType = getElementTypeOrSelf(operand.getType());
return isa<IntegerType, FloatType>(elementType) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use elementType.isIntOrFloat()

@@ -2549,10 +2500,6 @@ static LogicalResult setRootConfig(IREE::GPU::TargetAttr target,
LDBG("Winograd Config");
return setWinogradOpConfig(target, entryPointFn, winogradOp);
})
.Case<linalg::PackOp>([&](auto packOp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the entry of setting config on pack op now? I'm having hard time to figure it out when I look at the code around it.

Copy link
Contributor Author

@Max191 Max191 Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It becomes part of the Default case now, and is handled by IREE::GPU::setTileAndFuseLoweringConfig. The implementation lives in ConfigUtils.cpp

Copy link
Contributor

@hanhanW hanhanW Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I missed this. I don't get that why we have a flag on default, and the default is not the default anymore..

if (!clLLVMGPUVectorizePipeline) {
if (succeeded(IREE::GPU::setTileAndFuseLoweringConfig(
target, entryPointFn, computeOp))) {
LDBG("Tile and fuse default config");
return success();
}
}
return setRootDefaultConfig(target, entryPointFn, computeOp);

Comment on lines +369 to +370
// We do not expect to find multiple pack/unpack ops in the same dispatch
// region, so we can simply return the multiples for the given `packOp`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true? I thought that there are cases like unpack -> elementwise/reduction -> pack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is not about dispatch formation. It is about root op selection. If we have a dispatch like unpack -> elementwise -> pack, then the elementwise op would become the root op, and we will not set a config on the pack/unpack ops.

I think I let the assumptions about the lowering config selection leak into the TileInferenceUtils, so it is quite confusing. I'll try to figure out a better way to handle this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you meant. So it is more like we do not expect to find multiple pack/unpack ops when the root op is a pack op. What if we have unpack->pack dispatch? What op is a root op?

@@ -560,16 +590,41 @@ LogicalResult setTileAndFuseLoweringConfig(IREE::GPU::TargetAttr target,
return failure();
}

SmallVector<unsigned int> partitionableLoops;
linalgOp.getParallelDims(partitionableLoops);
// SmallVector<unsigned int> partitionableLoops;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delete the comment

Comment on lines +560 to +562
unsigned minBitwidth;
unsigned representativeBitWidth;
bool vectorizable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to initialize them in struct definition. Otherwise, it may lead to undefined behavior. E.g., the vectorizable is not initialized in the below pack distribution config.

https://abseil.io/tips/182

Comment on lines +603 to +606
bool projPerm =
llvm::all_of(linalgOp.getIndexingMapsArray(),
[](AffineMap map) { return map.isProjectedPermutation(); });
bool powTwo = llvm::all_of(op->getOperands(), elementHasPowerOfTwoBitwidth);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nit: I'd name them as isProjPerm and isPowerOfTwo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow compilation in check_rocm_hip_pack.mlir_module.vmfb
2 participants