Skip to content
This repository has been archived by the owner on Apr 23, 2021. It is now read-only.

memref lowering: stack overflow due to alloca's for memref descriptors for calls inside a loop #210

Open
bondhugula opened this issue Oct 25, 2019 · 1 comment

Comments

@bondhugula
Copy link
Contributor

When lowering MLIR to LLVM, since memrefs are lowered through llvm structs that hold the descriptor info, the alloca's for these structs can exhaust stack space when there are calls with memref args inside a loop! Here's an example snippet:


  affine.for %arg6 = #map11(%arg4) to #map12(%arg4) {
    call @foo(%0, %1, %arg2, %arg6) : (memref<64x512xf32>, memref<512x1xvector<8xf32>>, memref<2048x256xvector<8xf32>>, index) -> ()
  }

The call to foo will be preceded by three alloca's corresponding to the memrefs passed. The lowered LLVM dialect snippet is below, and given a typical number of %arg6 iterations, will run out of stack space (with 8 MB stacks).


^bb19(%151: !llvm.i64): // 2 preds: ^bb18, ^bb20
    %152 = llvm.icmp "slt" %151, %149 : !llvm.i64
    llvm.cond_br %152, ^bb20, ^bb21
  ^bb20:  // pred: ^bb19
    %153 = llvm.mlir.constant(1 : index) : !llvm.i64
    %154 = llvm.alloca %153 x !llvm<"{ float*, i64, [2 x i64], [2 x i64] }"> : (!llvm.i64) -> !llvm<"{ float*, i64, [2 x i64], [2 x i64] }*">
    llvm.store %32, %154 : !llvm<"{ float*, i64, [2 x i64], [2 x i64] }*">
    %155 = llvm.mlir.constant(1 : index) : !llvm.i64
    %156 = llvm.alloca %155 x !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }"> : (!llvm.i64) -> !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    llvm.store %102, %156 : !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    %157 = llvm.mlir.constant(1 : index) : !llvm.i64
    %158 = llvm.alloca %157 x !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }"> : (!llvm.i64) -> !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    llvm.store %2, %158 : !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    llvm.call @foo(%154, %156, %158, %151) : (!llvm<"{ float*, i64, [2 x i64], [2 x i64] }*">, !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">, !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">, !llvm.i64) -> ()
    %159 = llvm.add %151, %150 : !llvm.i64
    llvm.br ^bb19(%159 : !llvm.i64)
  ^bb21:  // pred: ^bb19

Increasing the stack size is a stop gap and obviously solves the issue here. I think this issue requires the same approach as with block local variables in C/C++ (say large structs with loop body scope)? Another solution is of inserting these alloca's at the highest level, i.e., right after the descriptors are defined (%32, %102, and %2 above).

On a separate note, hoisting such alloca's out is valid here; however, LICM won't do it since alloc's have side effects. Moreover, it can't be done without knowing what's inside @foo, even if there is a utility to hoist alloc's. In some way, the meaning / special property of these alloc'ed descriptors is hard to later recover if you don't exploit it at the time you generate them.

@ftynse
Copy link
Contributor

ftynse commented Feb 11, 2020

This should be addressed by llvm/llvm-project@5a17780

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants