Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make outlined function arguments non-aliasing #1006

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

newling
Copy link
Contributor

@newling newling commented Jan 3, 2025

Making the 'C' argument of an outlined matmul (C = A@B) function 'noalias' results in much faster code. It brings the outlined code's performance within a few percentage of the non-outlined code, for -O2. Using -O3 with outlining is still our fastest direct codegen approach, and the program memory usage at O3 is still larger than with O2, but with this change at least we can fall back to O2+outlining from O3+outlining without seeing as bad a drop in performance as before.

@newling
Copy link
Contributor Author

newling commented Jan 7, 2025

Some tests fail numerically. Created this issue to understand problem better Xilinx/llvm-aie#250

@newling newling force-pushed the noalias branch 2 times, most recently from 2b19692 to dc8c92e Compare January 9, 2025 19:45
@newling newling marked this pull request as ready for review January 9, 2025 20:02
@@ -65,6 +64,19 @@ static FailureOr<func::FuncOp> outline(IRRewriter &rewriter, ModuleOp moduleOp,
// arguments.
Operation *clonedComputeOp = rewriter.clone(*computeOp, operandMap);

if (noAliasFinalArg) {
ArrayRef<BlockArgument> args = func.getArguments();
// Find the first MemRefType argument, starting at the end.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just put noalias on all the arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put it on all memref args. ukernels have memref and index args (ukernels have offset arguments) so would at least need to filter out the non tensor indices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would better then I think as it would avoid the hardcoding on the last argument

"moving data to/from the AIE cores.">,
Option<"noAliasMemrefArgs", "no-alias-memref-args", "bool", /*default=*/"true",
"A developer only option. When 'true' (the default), "
"the final memref argument of the outlined function "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"the final memref argument of the outlined function "
"all memref arguments of the outlined function "

Comment on lines +307 to +308
"as an option, as there is no guarantee that it is valid "
"-- the final argument could in theory alias another argument. ">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"as an option, as there is no guarantee that it is valid "
"-- the final argument could in theory alias another argument. ">
"as an option, as there is no guarantee that it is valid.">

"i.e. checking that the final argument is not aliased "
"to any other arguments is too complicated/expensive. "
"The addition of this attribute in this pass is exposed "
"as an option, as there is no guarantee that it is valid "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee that is valid, and you're saying it's a developer only option, but we're setting it to true by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. If by "developer only option" we mean one that we don't guarantee correctness for when changed from default, then I suppose it's not a developer only option. What I meant was that it's an option that you shouldn't be playing with to eke out extra performance.

Alternative approach (1): remove it as option, and do a shallow check that it's valid by checking if outlined payload is a matmul, or do some basic analysis of pointers. My concern is that this can balloon into complex code.

Alternative approach (2): make the option 'false' by default. My concern here is that it should probably be true for every workload we're likely to see.

Copy link
Collaborator

@jtuyls jtuyls Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about this and I think we can guarantee the noalias on the function arguments at the point in pipeline when we have amdaie.logicalobjectfifo.access operations to retrieve a memref from a logical objectFifo by checking that we only add noalias when the memref is a result from a amdaie.logicalobjectfifo.access and there is only one access on that logical objectFifo. This happens after function outlining, so I think it would be better to have a separate pass for adding the noalias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for thinking about this! I think for an analysis to be easier in pass N than pass M (where M<N) either (1) the analysis is done in an intervening pass P (M<P<=N) or (2) an assumption is baked into the IR in an intervening pass.

Either way, I think it makes sense to have a pass to add the noalias attribute. Just wondering what level is best.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you base the analysis on amdaie.logicalobjectfifo.access, it should work on any IR, as long as those ops are available, which means that you could put it anywhere after createAMDAIEDistributeCoresAndObjectFifosPass. Similarly, the LinalgFunctionOutling pass could be moved anywhere as well I think, so no assumptions need to be baked in, but the noalias insertion would only find opportunities if amdaie.logicalobjectfifo.access ops are found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants