-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dill build fails on crusher@olcf w/ clang-14.0.0 during ADIOS2 build #27
Comments
Actually, yes, modules and compiler flags would be useful. Not sure we can fix this because it looks like a compiler problem, but I wasn’t able to reproduce this on first try.
|
Here is a build script. I had to add |
So, here's what I've found out. First, this looks to be a compiler bug. llvm seems to be faulting in llvm::BasicBlock::isEntryBlock(). While it does give us an llvm stack dump, there's no real clue as to what in the source code might be triggering the bug. Second, I have modified your script to build dill outside of ADIOS (just substituting the dill github repo for the ADIOS spec), and I don't see the failure in that circumstance. So, while the issue seems to be inside the compiler, the ADIOS configuration may be adding flags or something else that are impacting whether or not it's triggered with this source. This obviously complicates things somewhat. This is all a bit odd. Dill does dynamic code generation, so has need to use asm() specs and other odd things that might confuse an optimizer. But that generally happens in the machine-specific source files, not dill_util.c, which contains relatively generic code. I think there are a couple of possible paths forward. Probably one needs to be submitting a bug report on llvm. For that, it would probably be helpful to try to sort out more details, narrowing down to the problematic part of dill_util.c, and or figuring out why this happen during the ADIOS build and not outside of it. The second path might be to sort out a workaround that would let you build ADIOS on crusher. The same experimenting required to narrow down the circumstances of the failure might also lend a clue to how it might be avoided. I'm both the maintainer of Dill and involved in ADIOS, so I'm positioned to work on this, but I've also got a stack of other things to work on, so progress might be slow. To the extent that you can help with the experimentation necessary to see why this is failing sometimes and not others, that would help speed resolution. |
Is there an update on this? I ran into the same issue, and I found that updating the ADIOS2/thirdparty/dill/CMakeLists.txt and forcing
|
update: I changed the ADIOS2 cmake configuration to not force |
I tend to agree with that diagnosis... If anyone can identify a workaround, I'm happy to push it into dill and then ADIOS. But unfortunately I don't have cycles for it myself at the moment. |
It looks like "-O3" optimization is causing some problem. RelWithDebInfo or Debug option with cmake, I can compile. |
Thanks Jong, that's helpful. I can maybe get on crusher, replicate the problem and see if there's a way to rearrange the code so that it doesn't trigger this compiler bug. Also, a quick google doesn't show any reports of clang segfaulting in IsEntryBlock, so this may be an unreported problem. If I can narrow down the problem I can also likely get a test case to submit a bug report. (Of course, finding a code-based workaround is probably the best short-term solution. Bug reporting is a long play. |
OK, I have a code workaround: +attribute((optnone)) That is, just add "attribute((optnone))" to the init_code_block() function in dill_util.c. (The problem seems to be arising when that clang tries to inline that function in places where it's called. Adding the optnone disables that.) I need to see how I can put this in the source so that it doesn't break on non-clang compilers, but that might not happen today as I have other things on my plate. But in the meantime, you have a workaround to compile ADIOS. |
@eisenhauer you might be able to wrap it with |
The recommendation seems to be like: ... I'll give that a go and see how far it gets me... |
FWIW I can easily reproduce this issue on my local workstation. It fails with the both 5.0.2 and 5.1.2 of the rocm stack, both based on llvm 14, but succeeds with 4.5.2, based on llvm 12. The following patch resolved the issue from the build system end:
I'm more partial to the build solution over an in-source patch since my gut tells me to not trust that opt level for the rest of dill, even if the compiler doesn't segfault. Up to @eisenhauer though as to which approach he prefers. |
Building ADIOS2 2.8.0 on crusher (https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html) fails during step
I cannot draw any useful information from the error output. It is attached together with all output obtained during both configure and build phase.
configure.out.txt
configure.err.txt
build.out.txt
build.err.txt
Disabling SST functionality of ADIOS2 by using option
-DADIOS2_USE_SST=OFF
when building ADIOS2 workarounds the issue for me (as I do not want to use SST anyways).I am happy to provide more information, e.g. regarding used modules and compiler flags, if needed.
The text was updated successfully, but these errors were encountered: