Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue? #878

Open
pdhirajkumarprasad opened this issue Nov 5, 2024 · 2 comments
Open

Memory issue? #878

pdhirajkumarprasad opened this issue Nov 5, 2024 · 2 comments
Assignees

Comments

@pdhirajkumarprasad
Copy link

For the attached IR, seeing crash as

iree-compile: /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2868: SmallVector<mlir::Value> mlir::TypeConverter::materializeTargetConversion(mlir::OpBuilder &, mlir::Location, mlir::TypeRange, mlir::ValueRange, mlir::Type) const: Assertion `TypeRange(ValueRange(result)) == resultTypes && "callback produced incorrect number of values or values with " "incorrect types"' failed.
Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0.	Program arguments: iree-compile --iree-hal-target-backends=llvm-cpu model.torch_onnx.mlir -o abc.vmfb
 #0 0x00007f46bb61a9b7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00007f46bb618bf0 llvm::sys::RunSignalHandlers() /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00007f46bb61b07a SignalHandler(int) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007f46b5855520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007f46b58a99fc __pthread_kill_implementation ./nptl/./nptl/pthread_kill.c:44:76
 #5 0x00007f46b58a99fc __pthread_kill_internal ./nptl/./nptl/pthread_kill.c:78:10
 #6 0x00007f46b58a99fc pthread_kill ./nptl/./nptl/pthread_kill.c:89:10
 #7 0x00007f46b5855476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007f46b583b7f3 abort ./stdlib/./stdlib/abort.c:81:7
 #9 0x00007f46b583b71b _nl_load_domain ./intl/./intl/loadmsgcat.c:1177:9
#10 0x00007f46b584ce96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x00007f46bfb1978f (/proj/xhdhdstaff6/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x9eb278f)
#12 0x00007f46bfb19596 llvm::SmallVectorBase<unsigned int>::empty() const /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:81:46
#13 0x00007f46bfb19596 mlir::TypeConverter::materializeTargetConversion(mlir::OpBuilder&, mlir::Location, mlir::Type, mlir::ValueRange, mlir::Type) const /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2851:14
#14 0x00007f46bfb16d6d legalizeUnresolvedMaterialization(mlir::RewriterBase&, (anonymous namespace)::UnresolvedMaterializationRewrite*) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:0:0
#15 0x00007f46bfb16d6d mlir::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2528:18
#16 0x00007f46bfb1ca5b mlir::applyPartialConversion(llvm::ArrayRef<mlir::Operation*>, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:3258:22
#17 0x00007f46bfb1ca5b mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:3264:10
#18 0x00007f46bd033bb1 mlir::iree_compiler::IREE::VM::ConversionPass::runOnOperation() /proj/xhdhdstaff6/dhirajp/localBuild/iree/compiler/src/iree/compiler/Dialect/VM/Transforms/Conversion.cpp:168:16
#19 0x00007f46bb80a835 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7::operator()() const /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:17
#20 0x00007f46bb80a835 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7>(long) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#21 0x00007f46bb80a835 llvm::function_ref<void ()>::operator()() const /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#22 0x00007f46bb80a835 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:280:7
#23 0x00007f46bb80a835 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:520:21
#24 0x00007f46bb80afa8 llvm::LogicalResult::failed() const /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/include/llvm/Support/LogicalResult.h:43:43
#25 0x00007f46bb80afa8 llvm::failed(llvm::LogicalResult) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/include/llvm/Support/LogicalResult.h:71:58
#26 0x00007f46bb80afa8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:592:9
#27 0x00007f46bb80d2f9 mlir::PassManager::run(mlir::Operation*) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:0
#28 0x00007f46bb56cd60 llvm::LogicalResult::failed() const /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/include/llvm/Support/LogicalResult.h:43:43
#29 0x00007f46bb56cd60 llvm::failed(llvm::LogicalResult) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/include/llvm/Support/LogicalResult.h:71:58
#30 0x00007f46bb56cd60 mlir::iree_compiler::embed::(anonymous namespace)::Invocation::runPipeline(iree_compiler_pipeline_t) /proj/xhdhdstaff6/dhirajp/localBuild/iree/compiler/src/iree/compiler/API/Internal/CompilerDriver.cpp:1008:7
#31 0x00007f46bb56cd60 ireeCompilerInvocationPipeline /proj/xhdhdstaff6/dhirajp/localBuild/iree/compiler/src/iree/compiler/API/Internal/CompilerDriver.cpp:1447:23
#32 0x00007f46bb796528 mlir::iree_compiler::runIreecMain(int, char**)::$_2::operator()(iree_compiler_source_t*) const /proj/xhdhdstaff6/dhirajp/localBuild/iree/compiler/src/iree/compiler/Tools/iree_compile_lib.cc:254:11
#33 0x00007f46bb795d61 mlir::iree_compiler::runIreecMain(int, char**) /proj/xhdhdstaff6/dhirajp/localBuild/iree/compiler/src/iree/compiler/Tools/iree_compile_lib.cc:0:10
#34 0x00007f46b583cd90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#35 0x00007f46b583ce40 call_init ./csu/../csu/libc-start.c:128:20
#36 0x00007f46b583ce40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#37 0x0000557b30e026d5 _start (/proj/xhdhdstaff6/dhirajp/localBuild/iree-build/tools/iree-compile+0x16d5)
Abort (core dumped)

1> If we run 10 times, 2/3 times, it works fine while rest of time, it's giving above stack
2> in IR, we have few line commented, which is not needed, If I delete those then it works fine for most of time

command: iree-compile --iree-hal-target-backends=llvm-cpu model.torch_onnx.mlir -o abc.vmfb
tt.mlir.txt

@AmosLewis
Copy link
Contributor

AmosLewis commented Nov 5, 2024

iree-compile --iree-hal-target-backends=llvm-cpu model.mlir -o model.vmfb --dump-compilation-phases-to=./tmp/
In the phases output, the hal is generated. The error happens when lower hal to vm phase

module {
^
<unknown>:0: error: failed to legalize unresolved materialization from ('i64') to ('index') that remained live after conversion
<unknown>:0: note: see current operation: %18 = "builtin.unrealized_conversion_cast"(%17) : (i64) -> index
model.mlir:865:12: note: see existing live user here: %x, %y, %z = flow.dispatch.workgroup_count_from_dag_root %19, %0, %1
    %867 = torch.operator "onnx.Add"(%866, %813) : (!torch.vtensor<[?,256,768],f32>, !torch.vtensor<[1,256,768],f32>) -> !torch.vtensor<[?,256,768],f32> 
           ^
model.mlir:1:1: error: conversion to vm.module failed

The VM will be created successfully if delete the code after
%866 = torch.operator "onnx.Add"(%838, %865) : (!torch.vtensor<[?,256,768],f32>, !torch.vtensor<[?,256,768],f32>) -> !torch.vtensor<[?,256,768],f32>

@AmosLewis
Copy link
Contributor

get smallest reproducer
iree-compile --iree-hal-target-backends=llvm-cpu model.mlir -o model.vmfb --dump-compilation-phases-to=./tmp/

module {
  func.func @tf2onnx(%arg0: !torch.vtensor<[?,768],f32>, %arg1: !torch.vtensor<[3],si64>, %arg2: !torch.vtensor<[?,256,768],f32>) -> ( !torch.vtensor<[?,256,768],f32>) attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "tf2onnx", torch.onnx_meta.producer_version = "1.5.2"} {
    %reshape = torch.operator "onnx.Reshape"(%arg0, %arg1) : (!torch.vtensor<[?,768],f32>, !torch.vtensor<[3],si64>) -> !torch.vtensor<[?,256,768],f32> 
    %866 = torch.operator "onnx.Add"(%reshape, %arg2) : (!torch.vtensor<[?,256,768],f32>, !torch.vtensor<[?,256,768],f32>) -> !torch.vtensor<[?,256,768],f32> 
    return %866 :  !torch.vtensor<[?,256,768],f32>
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants