memref lowering: stack overflow due to alloca's for memref descriptors for calls inside a loop #210

bondhugula · 2019-10-25T15:50:49Z

When lowering MLIR to LLVM, since memrefs are lowered through llvm structs that hold the descriptor info, the alloca's for these structs can exhaust stack space when there are calls with memref args inside a loop! Here's an example snippet:

  affine.for %arg6 = #map11(%arg4) to #map12(%arg4) {
    call @foo(%0, %1, %arg2, %arg6) : (memref<64x512xf32>, memref<512x1xvector<8xf32>>, memref<2048x256xvector<8xf32>>, index) -> ()
  }

The call to foo will be preceded by three alloca's corresponding to the memrefs passed. The lowered LLVM dialect snippet is below, and given a typical number of %arg6 iterations, will run out of stack space (with 8 MB stacks).

^bb19(%151: !llvm.i64): // 2 preds: ^bb18, ^bb20
    %152 = llvm.icmp "slt" %151, %149 : !llvm.i64
    llvm.cond_br %152, ^bb20, ^bb21
  ^bb20:  // pred: ^bb19
    %153 = llvm.mlir.constant(1 : index) : !llvm.i64
    %154 = llvm.alloca %153 x !llvm<"{ float*, i64, [2 x i64], [2 x i64] }"> : (!llvm.i64) -> !llvm<"{ float*, i64, [2 x i64], [2 x i64] }*">
    llvm.store %32, %154 : !llvm<"{ float*, i64, [2 x i64], [2 x i64] }*">
    %155 = llvm.mlir.constant(1 : index) : !llvm.i64
    %156 = llvm.alloca %155 x !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }"> : (!llvm.i64) -> !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    llvm.store %102, %156 : !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    %157 = llvm.mlir.constant(1 : index) : !llvm.i64
    %158 = llvm.alloca %157 x !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }"> : (!llvm.i64) -> !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    llvm.store %2, %158 : !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    llvm.call @foo(%154, %156, %158, %151) : (!llvm<"{ float*, i64, [2 x i64], [2 x i64] }*">, !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">, !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">, !llvm.i64) -> ()
    %159 = llvm.add %151, %150 : !llvm.i64
    llvm.br ^bb19(%159 : !llvm.i64)
  ^bb21:  // pred: ^bb19

Increasing the stack size is a stop gap and obviously solves the issue here. I think this issue requires the same approach as with block local variables in C/C++ (say large structs with loop body scope)? Another solution is of inserting these alloca's at the highest level, i.e., right after the descriptors are defined (%32, %102, and %2 above).

On a separate note, hoisting such alloca's out is valid here; however, LICM won't do it since alloc's have side effects. Moreover, it can't be done without knowing what's inside @foo, even if there is a utility to hoist alloc's. In some way, the meaning / special property of these alloc'ed descriptors is hard to later recover if you don't exploit it at the time you generate them.

The text was updated successfully, but these errors were encountered:

ftynse · 2020-02-11T12:57:24Z

This should be addressed by llvm/llvm-project@5a17780

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memref lowering: stack overflow due to alloca's for memref descriptors for calls inside a loop #210

memref lowering: stack overflow due to alloca's for memref descriptors for calls inside a loop #210

bondhugula commented Oct 25, 2019

ftynse commented Feb 11, 2020

memref lowering: stack overflow due to alloca's for memref descriptors for calls inside a loop #210

memref lowering: stack overflow due to alloca's for memref descriptors for calls inside a loop #210

Comments

bondhugula commented Oct 25, 2019

ftynse commented Feb 11, 2020