-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Only run LowerUnreachable when necessary #109868
base: main
Are you sure you want to change the base?
[NVPTX] Only run LowerUnreachable when necessary #109868
Conversation
@llvm/pr-subscribers-backend-nvptx Author: Justin Fargnoli (justinfargnoli) ChangesNVPTX: Lower unreachable to exit to allow ptxas to accurately reconstruct the CFG. added the LowerUnreachable pass to NVPTX. Based on the PR description: > Finally, although I expect this to fix most of this pass is only necessary when targeting Pascal or earlier via ptxas from CUDA 11.4 or earlier. This PR updates NVPTXTargetMachine.cpp to reflect that. CC @maleadt Full diff: https://github.com/llvm/llvm-project/pull/109868.diff 3 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/NVPTXSubtarget.h b/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
index 8b9059bd60cbd4..e2ce088cacdf53 100644
--- a/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
+++ b/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
@@ -95,6 +95,9 @@ class NVPTXSubtarget : public NVPTXGenSubtargetInfo {
bool hasDotInstructions() const {
return SmVersion >= 61 && PTXVersion >= 50;
}
+ bool hasPTXASUnreachableBug() const {
+ return SmVersion < 70 && PTXVersion <= 74;
+ }
bool hasCvtaParam() const { return SmVersion >= 70 && PTXVersion >= 77; }
unsigned int getFullSmVersion() const { return FullSmVersion; }
unsigned int getSmVersion() const { return getFullSmVersion() / 10; }
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
index 57b7fa783c14a7..b79b4ff93efe49 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
@@ -368,9 +368,13 @@ void NVPTXPassConfig::addIRPasses() {
addPass(createSROAPass());
}
- const auto &Options = getNVPTXTargetMachine().Options;
- addPass(createNVPTXLowerUnreachablePass(Options.TrapUnreachable,
- Options.NoTrapAfterNoreturn));
+ if (ST.hasPTXASUnreachableBug()) {
+ // Run LowerUnreachable to WAR a ptxas bug. See the commit description of
+ // 1ee4d880e8760256c606fe55b7af85a4f70d006d for more details.
+ const auto &Options = getNVPTXTargetMachine().Options;
+ addPass(createNVPTXLowerUnreachablePass(Options.TrapUnreachable,
+ Options.NoTrapAfterNoreturn));
+ }
}
bool NVPTXPassConfig::addInstSelector() {
diff --git a/llvm/test/CodeGen/NVPTX/unreachable.ll b/llvm/test/CodeGen/NVPTX/unreachable.ll
index 011497c4e23401..4219f0f3b47fc9 100644
--- a/llvm/test/CodeGen/NVPTX/unreachable.ll
+++ b/llvm/test/CodeGen/NVPTX/unreachable.ll
@@ -1,11 +1,15 @@
; RUN: llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs \
-; RUN: | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-NOTRAP
+; RUN: | FileCheck %s --check-prefixes=CHECK,CHECK-NOTRAP
; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs \
-; RUN: | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-NOTRAP
+; RUN: | FileCheck %s --check-prefixes=CHECK,CHECK-NOTRAP
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs -mattr=+ptx75 \
+; RUN: | FileCheck %s --check-prefixes=CHECK-BUG-FIXED
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_70 -verify-machineinstrs \
+; RUN: | FileCheck %s --check-prefixes=CHECK-BUG-FIXED
; RUN: llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs -trap-unreachable \
-; RUN: | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-TRAP
+; RUN: | FileCheck %s --check-prefixes=CHECK,CHECK-TRAP
; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs -trap-unreachable \
-; RUN: | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-TRAP
+; RUN: | FileCheck %s --check-prefixes=CHECK,CHECK-TRAP
; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs | %ptxas-verify %}
; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | %ptxas-verify %}
@@ -21,12 +25,11 @@ define void @kernel_func() {
; CHECK-TRAP: trap;
; CHECK-NOTRAP-NOT: trap;
; CHECK: exit;
+; CHECK-BUG-FIXED-NOT: exit;
unreachable
}
attributes #0 = { noreturn }
-
!nvvm.annotations = !{!1}
-
!1 = !{ptr @kernel_func, !"kernel", i32 1}
|
const auto &Options = getNVPTXTargetMachine().Options; | ||
addPass(createNVPTXLowerUnreachablePass(Options.TrapUnreachable, | ||
Options.NoTrapAfterNoreturn)); | ||
if (ST.hasPTXASUnreachableBug()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Topic for discussion: should this be a CLI option?
I added the check for PTX Version <= 7.4 because it's the best proxy for querying whether ptxas
comes from CUDA 11.4 or earlier I could find. (The highest version CUDA 11.4 ptxas
supports is 7.4.)
Alternatively, we could put the onus on the user to set a flag indicating that they're using a copy of ptxas
from CUDA 11.4 or prior.
CC @Artem-B for comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For PTX > 8.3 it's not needed.
For PTX versions older than that, we should assume that it may be given to the ptxas with a bug. The problem is that we make the decision on how to generate code at the build time, but we may not know which ptxas will be used until the run time, in general case. We should conservatively assume that we do need to produce trap; exit;
in that case.
If we eventually find a legitimate use case for such an option, we can add it then. For now, I do not see much point.
I'm confused. The comment you quoted in the description only says
The way I read it, it says that 11.4 and older ptxas with older GPUs may have more bugs that are not fixed by this patch, but I do not think it implies that the newer versions of ptxas or the newer GPUs do not need this pass. What am I missing? |
Yeah, that was what I meant. Although we only know about issues on Pascal and earlier because
We could avoid lowering |
If we know which ptxas version no longer needs
With that in mind, this condition should be changed appropriately. Assuming that it's the latest public CUDA release that has the bug fixed, we should only mark PTX ISA 8.5 or newer as safe. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes |
That would be a change in the instruction selection backend though, not disabling this pass altogether, since that would mean there's still a need for Alternatively, one could just do what WebAssembly and PS4+PS5 do and force these TargetOptions to be enabled. That would get rid of this pass completely. The question of "should TargetOptions be load-bearing" is a good one, but I feel like that bridge has been crossed already and there are many, many other places in which these things are inaccessible or user-provided command line options are just outright ignored. |
Yes, this is my bad. I should have quoted places where @maleadt said things along the lines of:
Thank you for the bug number! I'm following up internally to confirm what release the fix was included in.
I thought this pass added the
This pass doesn't touch the trap instruction, so it should be okay to disable it, right? |
Sorry, I probably added some confusion with the way I worded things. What I was saying is that, if the reason this pass exists is to avoid bogus CFG edges (which makes sense, especially if there are downstream optimizations happening) then it either:
What I meant by "change in instruction selection" was avoiding emitting The introduction of the pass happened to be due to a miscompile because of bogus CFG edges, but even if that's been fixed it's still probably a useful hint to any compilers taking this output as input like @maleadt said. |
To me it looks like there are two things we need to decide here:
The downside of this approach is that it is an extra instruction and, I believe, there were some reported use cases where it resulted in performance regressions in user code. Ideally, it would be nice if ptxas would provide a NOP-like equivalent of
|
6bc6174
to
b88976b
Compare
I followed up internally and confirmed that
I've confirmed that this was fixed in CUDA 12.3, corresponding to PTX version 8.3.
Assuming that we do want to emit the unreachable hint (I don't think anyone has advocated against this), this PR should be ready for another round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a test nit.
const auto &Options = getNVPTXTargetMachine().Options; | ||
addPass(createNVPTXLowerUnreachablePass(Options.TrapUnreachable, | ||
Options.NoTrapAfterNoreturn)); | ||
if (ST.hasPTXASUnreachableBug()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For PTX > 8.3 it's not needed.
For PTX versions older than that, we should assume that it may be given to the ptxas with a bug. The problem is that we make the decision on how to generate code at the build time, but we may not know which ptxas will be used until the run time, in general case. We should conservatively assume that we do need to produce trap; exit;
in that case.
If we eventually find a legitimate use case for such an option, we can add it then. For now, I do not see much point.
NVPTX: Lower unreachable to exit to allow ptxas to accurately reconstruct the CFG. added the LowerUnreachable pass to NVPTX. Based on the PR description:
this pass is only necessary when targeting Pascal or earlier via ptxas from CUDA 11.4 or earlier. This PR updates NVPTXTargetMachine.cpp to reflect that.
CC @maleadt