-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make outlined function arguments non-aliasing #1006
base: main
Are you sure you want to change the base?
Conversation
Some tests fail numerically. Created this issue to understand problem better Xilinx/llvm-aie#250 |
2b19692
to
dc8c92e
Compare
@@ -65,6 +64,19 @@ static FailureOr<func::FuncOp> outline(IRRewriter &rewriter, ModuleOp moduleOp, | |||
// arguments. | |||
Operation *clonedComputeOp = rewriter.clone(*computeOp, operandMap); | |||
|
|||
if (noAliasFinalArg) { | |||
ArrayRef<BlockArgument> args = func.getArguments(); | |||
// Find the first MemRefType argument, starting at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just put noalias
on all the arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could put it on all memref args. ukernels have memref and index args (ukernels have offset arguments) so would at least need to filter out the non tensor indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would better then I think as it would avoid the hardcoding on the last argument
"moving data to/from the AIE cores.">, | ||
Option<"noAliasMemrefArgs", "no-alias-memref-args", "bool", /*default=*/"true", | ||
"A developer only option. When 'true' (the default), " | ||
"the final memref argument of the outlined function " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the final memref argument of the outlined function " | |
"all memref arguments of the outlined function " |
"as an option, as there is no guarantee that it is valid " | ||
"-- the final argument could in theory alias another argument. "> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"as an option, as there is no guarantee that it is valid " | |
"-- the final argument could in theory alias another argument. "> | |
"as an option, as there is no guarantee that it is valid."> |
"i.e. checking that the final argument is not aliased " | ||
"to any other arguments is too complicated/expensive. " | ||
"The addition of this attribute in this pass is exposed " | ||
"as an option, as there is no guarantee that it is valid " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no guarantee that is valid, and you're saying it's a developer only option, but we're setting it to true by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. If by "developer only option" we mean one that we don't guarantee correctness for when changed from default, then I suppose it's not a developer only option. What I meant was that it's an option that you shouldn't be playing with to eke out extra performance.
Alternative approach (1): remove it as option, and do a shallow check that it's valid by checking if outlined payload is a matmul, or do some basic analysis of pointers. My concern is that this can balloon into complex code.
Alternative approach (2): make the option 'false' by default. My concern here is that it should probably be true for every workload we're likely to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been thinking about this and I think we can guarantee the noalias
on the function arguments at the point in pipeline when we have amdaie.logicalobjectfifo.access
operations to retrieve a memref from a logical objectFifo by checking that we only add noalias
when the memref is a result from a amdaie.logicalobjectfifo.access
and there is only one access on that logical objectFifo. This happens after function outlining, so I think it would be better to have a separate pass for adding the noalias
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for thinking about this! I think for an analysis to be easier in pass N than pass M (where M<N) either (1) the analysis is done in an intervening pass P (M<P<=N) or (2) an assumption is baked into the IR in an intervening pass.
Either way, I think it makes sense to have a pass to add the noalias
attribute. Just wondering what level is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you base the analysis on amdaie.logicalobjectfifo.access
, it should work on any IR, as long as those ops are available, which means that you could put it anywhere after createAMDAIEDistributeCoresAndObjectFifosPass
. Similarly, the LinalgFunctionOutling pass could be moved anywhere as well I think, so no assumptions need to be baked in, but the noalias
insertion would only find opportunities if amdaie.logicalobjectfifo.access
ops are found.
Making the 'C' argument of an outlined matmul (C = A@B) function 'noalias' results in much faster code. It brings the outlined code's performance within a few percentage of the non-outlined code, for -O2. Using -O3 with outlining is still our fastest direct codegen approach, and the program memory usage at O3 is still larger than with O2, but with this change at least we can fall back to O2+outlining from O3+outlining without seeing as bad a drop in performance as before.