-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] error running program when compiled with asan #127241
Comments
Could you please try 19, 20 release candidate or |
@zyx-billy Can you dump the e_flags from the code object you are producing? What GPU and code object version are you targeting? What kind of GPU do you have on the system? You can see how to decode the flags at https://llvm.org/docs/AMDGPUUsage.html#header . |
Thanks for the response. Right now we're using a pretty up to date version of main (ea6827c as of 4 days ago). Here's the gpu on our system from
Our code is targeting gfx942 with code object version 5. This is the top of our amdgcn:
And the e_flags from our code object is
BTW, in case One random thought, is it possible the |
A couple thoughts about what it might be:
Are you JIT compiling this? Do you have a .bc for the actual kernel and can you try merging/linking those bc files and see if you get a clash from llvm-as? |
Oh yes I got the
I'm compiling this ahead of time. Basically I have the in-memory LLVM IR of our kernel, and I used the exact same linking logic from the triton impl here on my IR module. Then I ran it through llvm optimizations and backend lowering passes to get the object file. |
Can you see any more information about the failure when setting environment AMD_LOG_LEVEL=2? Is the LLVM version you're using to create code objects 18 or something newer? Can you move to ROCm 6.3.2? |
Can you also double check the fields in the attributes of the .ll IR file? You should see something like: |
hmm, the only additional output I get with AMD_LOG_LEVEL=2 is
And yes I see these attributes on our kernel in the combined IR:
I tried linking in the .bc files after optimization passes instead, but it didn't make a difference. The LLVM I'm using is very recent (< 1 week old on main). I'll retry with the latest ROCm release. |
The closer you can get the device library to the compiler you're using the better. @CRobeck and I have seen this before elsewhere, but I'm not clear on exactly what cleared it up then. |
Unfortunately I get the same error with 6.3.2 (and there's also no additional output under AMD_LOG_LEVEL=2 anymore). Though it looks like the updated asanrtl.bc library is also created with clang 18 (the contents of the library does differ). |
Does environment LOADER_ENABLE_LOGGING=1 give any additional output? |
oh amazing! Just what I was looking for. It gives:
And indeed all I see is this in our combined IR:
The same goes for I looked around and found that these values need to be set onto the IR. When I manually added them by linking to the relevant .bc files that came with the install (e.g. Testing with a correct program, it runs to completion without errors. Testing with an out-of-bounds array access, I get an asan report correctly (with debuginfo interpreted correctly). Thank you for all of your help 🙏 ! This has been immensely helpful. |
Oh and btw, was able to confirm this works on 6.2.0 too. Closing the issue then. Thank you! 🙏 |
I'm working on adding asan when emitting for amd gpus using the upstream llvm backend. Right now I'm doing these things:
And when running the program, I explicitly link the asan libraries, and enable xnack:
But I run into this error when trying to invoke a kernel:
Removing the asan llvm pass makes the program run fine (but of course, it won't detect any errors), indicating that everything else seems to work. I also tried linking asanrtl.bc, ocml.bc, & ockl.bc into the IR before running llvm passes (following the impl here), but got the same error.
My questions are:
Happy to provide more context. Thank you!
The text was updated successfully, but these errors were encountered: