GPU Sanitizer trap updates#12
Closed
EthanLuisMcDonough wants to merge 7 commits intojdoerfert:pr/offload_sanitizer_trapsfrom
Closed
GPU Sanitizer trap updates#12EthanLuisMcDonough wants to merge 7 commits intojdoerfert:pr/offload_sanitizer_trapsfrom
EthanLuisMcDonough wants to merge 7 commits intojdoerfert:pr/offload_sanitizer_trapsfrom
Conversation
This is the first commit for a new "OffloadSanitizer" that is designed to work well on GPUs. To keep the commit small, only traps are sanitized and we only report information about the encountering thread. It is also restricted to AMD GPUs for now, though that is not a conceptual requirement. The communication between the instrumented device code and the runtime is performed via host initialized pinned memory. If an error is detected, one encountering thread will setup this sanitizer environment and a hardware trap is executed to end the kernel. The host trap handler can check the sanitizer environment to determine if the trap was issued by the sanitizer code or not. If so, we report the reason (for now only that a trap was encountered), the encountering thread id, and the PC.
548d26d to
494e271
Compare
jdoerfert
pushed a commit
that referenced
this pull request
Jan 8, 2025
…ns (llvm#120564) This patch makes sure that `BinaryContext::printInstruction` prints the preferred disassembly. Preferred disassembly only gets printed when there are no annotations on the MCInst. Therefore, this patch temporarily removes the annotations before printing it. A few examples of before and after on AArch64 instructions are as follows: ``` BEFORE AFTER (preferred disassembly) ret x30 ret orr x30, xzr, x0 mov x30, x0 hint llvm#29 autiasp hint #12 autia1716 ``` Clearly, the preferred disassembly is easier for developers to read, and is the disassembly that tools should be printing. This patch is motivated as part of future work on the llvm-bolt-binary-analysis tool, making sure that the reports it prints do use preferred disassembly. This patch was cherry-picked from https://github.com/kbeyls/llvm-project/tree/bolt-gadget-scanner-prototype. In this current patch, this only affects existing RISCV test cases. This patch also does improve test cases in future patches that will introduce a binary analysis for llvm-bolt-binary-analysis that checks for correct application of pac-ret (pointer authentication on return addresses).
jdoerfert
pushed a commit
that referenced
this pull request
May 9, 2025
The mcmodel=tiny memory model is only valid on ARM targets. While trying this on X86 compiler throws an internal error along with stack dump. llvm#125641 This patch resolves the issue. Reduced test case: ``` #include <stdio.h> int main( void ) { printf( "Hello, World!\n" ); return 0; } ``` ``` 0. Program arguments: /opt/compiler-explorer/clang-trunk/bin/clang++ -gdwarf-4 -g -o /app/output.s -fno-verbose-asm -S --gcc-toolchain=/opt/compiler-explorer/gcc-snapshot -fcolor-diagnostics -fno-crash-diagnostics -mcmodel=tiny <source> 1. <eof> parser at end of file #0 0x0000000003b10218 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b10218) #1 0x0000000003b0e35c llvm::sys::CleanupOnSignal(unsigned long) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b0e35c) #2 0x0000000003a5dbc3 llvm::CrashRecoveryContext::HandleExit(int) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a5dbc3) #3 0x0000000003b05cfe llvm::sys::Process::Exit(int, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b05cfe) #4 0x0000000000d4e3eb LLVMErrorHandler(void*, char const*, bool) cc1_main.cpp:0:0 #5 0x0000000003a67c93 llvm::report_fatal_error(llvm::Twine const&, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a67c93) #6 0x0000000003a67df8 (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a67df8) #7 0x0000000002549148 llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x2549148) #8 0x00000000025491fc llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x25491fc) #9 0x0000000003db74cc clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3db74cc) #10 0x0000000004460d95 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4460d95) #11 0x00000000060005ec clang::ParseAST(clang::Sema&, bool, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x60005ec) #12 0x00000000044614b5 clang::CodeGenAction::ExecuteAction() (/opt/compiler-explorer/clang-trunk/bin/clang+++0x44614b5) #13 0x0000000004737121 clang::FrontendAction::Execute() (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4737121) #14 0x00000000046b777b clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x46b777b) #15 0x00000000048229e3 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x48229e3) #16 0x0000000000d50621 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd50621) #17 0x0000000000d48e2d ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0 #18 0x00000000044acc99 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) Job.cpp:0:0 #19 0x0000000003a5dac3 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a5dac3) #20 0x00000000044aceb9 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (.part.0) Job.cpp:0:0 #21 0x00000000044710dd clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/opt/compiler-explorer/clang-trunk/bin/clang+++0x44710dd) #22 0x0000000004472071 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4472071) #23 0x000000000447c3fc clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x447c3fc) llvm#24 0x0000000000d4d2b1 clang_main(int, char**, llvm::ToolContext const&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd4d2b1) llvm#25 0x0000000000c12464 main (/opt/compiler-explorer/clang-trunk/bin/clang+++0xc12464) llvm#26 0x00007ae43b029d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90) llvm#27 0x00007ae43b029e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40) llvm#28 0x0000000000d488c5 _start (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd488c5) ``` --------- Co-authored-by: Shashwathi N <nshashwa@pe31.hpc.amslabs.hpecorp.net>
jdoerfert
pushed a commit
that referenced
this pull request
Feb 25, 2026
…lvm#178069) Kernel panic is a special case, and there is no signal or exception for that so we need to rely on special workaround called `dumptid`. FreeBSDKernel plugin is supposed to find this thread and set it manually through `SetStopInfo()` in `CalculateStopInfo()` like Mach core plugin does. Before (We had to find and select crashed thread list otherwise thread 1 was selected by default): ``` ➜ sudo lldb /boot/panic/kernel -c /var/crash/vmcore.last (lldb) target create "/boot/panic/kernel" --core "/var/crash/vmcore.last" Core file '/var/crash/vmcore.last' (x86_64) was loaded. (lldb) bt * thread #1, name = '(pid 12991) dtrace' * frame #0: 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff8015882f780, flags=259) at sched_ule.c:2448:26 frame #1: 0xffffffff80bd38d2 kernel`mi_switch(flags=259) at kern_synch.c:530:2 frame #2: 0xffffffff80c29799 kernel`sleepq_switch(wchan=0xfffff8014edff300, pri=0) at subr_sleepqueue.c:608:2 frame #3: 0xffffffff80c29b76 kernel`sleepq_catch_signals(wchan=0xfffff8014edff300, pri=0) at subr_sleepqueue.c:523:3 frame #4: 0xffffffff80c29d32 kernel`sleepq_timedwait_sig(wchan=<unavailable>, pri=<unavailable>) at subr_sleepqueue.c:704:11 frame #5: 0xffffffff80bd2e2d kernel`_sleep(ident=0xfffff8014edff300, lock=0xffffffff81df2880, priority=768, wmesg="uwait", sbt=2573804118162, pr=0, flags=512) at kern_synch.c:215:10 frame #6: 0xffffffff80be8622 kernel`umtxq_sleep(uq=0xfffff8014edff300, wmesg="uwait", timo=0xfffffe0279cb3d20) at kern_umtx.c:843:11 frame #7: 0xffffffff80bef87a kernel`do_wait(td=0xfffff8015882f780, addr=<unavailable>, id=0, timeout=0xfffffe0279cb3d90, compat32=1, is_private=1) at kern_umtx.c:1316:12 frame #8: 0xffffffff80bed264 kernel`__umtx_op_wait_uint_private(td=0xfffff8015882f780, uap=0xfffffe0279cb3dd8, ops=<unavailable>) at kern_umtx.c:3990:10 frame #9: 0xffffffff80beaabe kernel`sys__umtx_op [inlined] kern__umtx_op(td=<unavailable>, obj=<unavailable>, op=<unavailable>, val=<unavailable>, uaddr1=<unavailable>, uaddr2=<unavailable>, ops=<unavailable>) at kern_umtx.c:4999:10 frame #10: 0xffffffff80beaa89 kernel`sys__umtx_op(td=<unavailable>, uap=<unavailable>) at kern_umtx.c:5024:10 frame #11: 0xffffffff81122cd1 kernel`amd64_syscall [inlined] syscallenter(td=0xfffff8015882f780) at subr_syscall.c:165:11 frame #12: 0xffffffff81122c19 kernel`amd64_syscall(td=0xfffff8015882f780, traced=0) at trap.c:1208:2 frame #13: 0xffffffff810f1dbb kernel`fast_syscall_common at exception.S:570 ``` After: ``` ➜ sudo ./build/bin/lldb /boot/panic/kernel -c /var/crash/vmcore.last (lldb) target create "/boot/panic/kernel" --core "/var/crash/vmcore.last" Core file '/var/crash/vmcore.last' (x86_64) was loaded. (lldb) bt * thread #18, name = '(pid 5409) powerd (crashed)', stop reason = kernel panic * frame #0: 0xffffffff80bc6c91 kernel`__curthread at pcpu_aux.h:57:2 [inlined] frame #1: 0xffffffff80bc6c91 kernel`doadump(textdump=0) at kern_shutdown.c:399:2 frame #2: 0xffffffff804b3b7a kernel`db_dump(dummy=<unavailable>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at db_command.c:596:10 frame #3: 0xffffffff804b396d kernel`db_command(last_cmdp=<unavailable>, cmd_table=<unavailable>, dopager=true) at db_command.c:508:3 frame #4: 0xffffffff804b362d kernel`db_command_loop at db_command.c:555:3 frame #5: 0xffffffff804b7026 kernel`db_trap(type=<unavailable>, code=<unavailable>) at db_main.c:267:3 frame #6: 0xffffffff80c16aaf kernel`kdb_trap(type=3, code=0, tf=0xfffffe01b605b930) at subr_kdb.c:790:13 frame #7: 0xffffffff8112154e kernel`trap(frame=<unavailable>) at trap.c:614:8 frame #8: 0xffffffff810f14c8 kernel`calltrap at exception.S:285 frame #9: 0xffffffff81da2290 kernel`cn_devtab + 64 frame #10: 0xfffffe01b605b8b0 frame #11: 0xffffffff84001c43 dtrace.ko`dtrace_panic(format=<unavailable>) at dtrace.c:652:2 frame #12: 0xffffffff84005524 dtrace.ko`dtrace_action_panic(ecb=0xfffff80539cad580) at dtrace.c:7022:2 [inlined] frame #13: 0xffffffff840054de dtrace.ko`dtrace_probe(id=88998, arg0=14343377283488, arg1=<unavailable>, arg2=<unavailable>, arg3=<unavailable>, arg4=<unavailable>) at dtrace.c:7665:6 frame #14: 0xffffffff83e5213d systrace.ko`systrace_probe(sa=<unavailable>, type=<unavailable>, retval=<unavailable>) at systrace.c:226:2 frame #15: 0xffffffff8112318d kernel`syscallenter(td=0xfffff801318d5780) at subr_syscall.c:160:4 [inlined] frame #16: 0xffffffff81123112 kernel`amd64_syscall(td=0xfffff801318d5780, traced=0) at trap.c:1208:2 frame #17: 0xffffffff810f1dbb kernel`fast_syscall_common at exception.S:570 ```
jdoerfert
pushed a commit
that referenced
this pull request
Feb 25, 2026
…n splitBlockBefore() (llvm#178895) (llvm#179392) llvm#178895 caused a clang crash(see https://lab.llvm.org/buildbot/#/builders/210/builds/8229), reverted in 6d52d26. The crash is assertion `DT && "DT should be available to update LoopInfo!"' failed. https://github.com/llvm/llvm-project/blob/ad8d5349d46734826aaeae4a2ebdc6f427a5bad8/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp#L1106 ``` #7 0x00007f5a380254e3 __assert_perror_fail (/usr/lib/libc.so.6+0x254e3) #8 0x0000563df5d8fde1 UpdateAnalysisInformation(llvm::BasicBlock*, llvm::BasicBlock*, llvm::ArrayRef<llvm::BasicBlock*>, llvm::DomTreeUpdater*, llvm::DominatorTree*, llvm::LoopInfo*, llvm::MemorySSAUpdater*, bool, bool&) BasicBlockUtils.cpp:0:0 #9 0x0000563df5d8f3bb llvm::splitBlockBefore(llvm::BasicBlock*, llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void, true, llvm::BasicBlock>, false, false>, llvm::DomTreeUpdater*, llvm::LoopInfo*, llvm::MemorySSAUpdater*, llvm::Twine const&) #10 0x0000563df5d8cb08 llvm::SplitEdge(llvm::BasicBlock*, llvm::BasicBlock*, llvm::DominatorTree*, llvm::LoopInfo*, llvm::MemorySSAUpdater*, llvm::Twine const&) #11 0x0000563df4ff5b59 (anonymous namespace)::CodeGenPrepare::splitLargeGEPOffsets()::$_1::operator()(long, llvm::Value*, llvm::GetElementPtrInst*) const CodeGenPrepare.cpp:0:0 #12 0x0000563df4fc0ec8 (anonymous namespace)::CodeGenPrepare::_run(llvm::Function&) CodeGenPrepare.cpp:0:0 #13 0x0000563df4fbb36c (anonymous namespace)::CodeGenPrepareLegacyPass::runOnFunction(llvm::Function&) CodeGenPrepare.cpp:0:0 ``` I think this happened when we get DominatorTree with `DT.get()` in `splitLargeGEPOffsets()` but `DT.reset()` already setting it to nullptr in https://github.com/llvm/llvm-project/blob/ad8d5349d46734826aaeae4a2ebdc6f427a5bad8/llvm/lib/CodeGen/CodeGenPrepare.cpp#L660. To fix this assertion failure, use `getDT()` for `splitLargeGEPOffsets()` to build the DominatorTree if it is set to nullptr by `DT.reset()`. I don't have a RSIC-V environment, so no reproducer. Checked that the crash is fixed by rerunning buildbot with this patch https://lab.llvm.org/buildbot/#/builders/210/builds/8248
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.