-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WebAssembly] Don't fold non-nuw add/sub in FastISel #111278
Conversation
We should not fold one of add/sub operands into a load/store's offset when `nuw` (no unsigned wrap) is not present, because the address calculation, which adds the offset with the operand, does not wrap. This is handled correctly in the normal ISel: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp#L328-L332 but not in FastISel. This positivity check in FastISel is not sufficient to avoid this case fully: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L348-L352 because 1. Even if RHS is within signed int range, depending on the value of the LHS, the resulting value can exceed uint32 max. 2. When one of the operands is a label, `Address` can contain a `GlobalValue` and a `Reg` at the same time, so the `GlobalValue` becomes incorrectly an offset: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L53-L69 https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L409-L417 Both cases are in the newly added test. This PR bails out of FastISel when there is an add/sub without a `nuw`. We should handle `SUB` too because `SUB` is the same as `ADD` when RHS's sign changes. I checked why our current normal ISel only handles `ADD`, and the reason it's OK for the normal ISel to handle only `ADD` seems that DAGCombiner replaces `SUB` with `ADD` here: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L3904-L3907 Fixes llvm#111018.
@llvm/pr-subscribers-backend-webassembly Author: Heejin Ahn (aheejin) ChangesWe should not fold one of add/sub operands into a load/store's offset when This is handled correctly in the normal ISel: llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp Lines 328 to 332 in 6de5305
This positivity check in FastISel is not sufficient to avoid this case fully: llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp Lines 348 to 352 in 6de5305
because
Both cases are in the newly added test. This PR bails out of FastISel when there is an add/sub without a We should handle Fixes #111018. Full diff: https://github.com/llvm/llvm-project/pull/111278.diff 3 Files Affected:
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
index 317c6463985dcb..b004192eed6dcc 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
@@ -337,6 +337,12 @@ bool WebAssemblyFastISel::computeAddress(const Value *Obj, Address &Addr) {
break;
}
case Instruction::Add: {
+ // We should not fold operands into an offset when 'nuw' (no unsigned wrap)
+ // is not present, because the address calculation does not wrap.
+ if (auto *OFBinOp = dyn_cast<OverflowingBinaryOperator>(U))
+ if (!OFBinOp->hasNoUnsignedWrap())
+ return false;
+
// Adds of constants are common and easy enough.
const Value *LHS = U->getOperand(0);
const Value *RHS = U->getOperand(1);
@@ -360,6 +366,12 @@ bool WebAssemblyFastISel::computeAddress(const Value *Obj, Address &Addr) {
break;
}
case Instruction::Sub: {
+ // We should not fold operands into an offset when 'nuw' (no unsigned wrap)
+ // is not present, because the address calculation does not wrap.
+ if (auto *OFBinOp = dyn_cast<OverflowingBinaryOperator>(U))
+ if (!OFBinOp->hasNoUnsignedWrap())
+ return false;
+
// Subs of constants are common and easy enough.
const Value *LHS = U->getOperand(0);
const Value *RHS = U->getOperand(1);
diff --git a/llvm/test/CodeGen/WebAssembly/fast-isel-bailout.ll b/llvm/test/CodeGen/WebAssembly/fast-isel-bailout.ll
new file mode 100644
index 00000000000000..80c5c31718e744
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/fast-isel-bailout.ll
@@ -0,0 +1,106 @@
+; RUN: llc < %s -asm-verbose=false -fast-isel -verify-machineinstrs | FileCheck %s
+
+target triple = "wasm32-unknown-unknown"
+
+; FastISel should not fold one of the add/sub operands into a load/store's
+; offset when 'nuw' (no unsigned wrap) is not present, because the address
+; calculation does not wrap. When there is an add/sub and nuw is not present, we
+; bail out of FastISel.
+
+@mylabel = external global ptr
+
+; CHECK-LABEL: dont_fold_non_nuw_add_load:
+; CHECK: local.get 0
+; CHECK-NEXT: i32.const 2147483644
+; CHECK-NEXT: i32.add
+; CHECK-NEXT: i32.load 0
+define i32 @dont_fold_non_nuw_add_load(ptr %p) {
+ %q = ptrtoint ptr %p to i32
+ %r = add i32 %q, 2147483644
+ %s = inttoptr i32 %r to ptr
+ %t = load i32, ptr %s
+ ret i32 %t
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_add_store:
+; CHECK: local.get 0
+; CHECK-NEXT: i32.const 2147483644
+; CHECK-NEXT: i32.add
+; CHECK-NEXT: i32.const 5
+; CHECK-NEXT: i32.store 0
+define void @dont_fold_non_nuw_add_store(ptr %p) {
+ %q = ptrtoint ptr %p to i32
+ %r = add i32 %q, 2147483644
+ %s = inttoptr i32 %r to ptr
+ store i32 5, ptr %s
+ ret void
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_add_load_2:
+; CHECK: i32.const mylabel
+; CHECK-NEXT: i32.const -4
+; CHECK-NEXT: i32.add
+; CHECK-NEXT: i32.load 0
+define i32 @dont_fold_non_nuw_add_load_2() {
+ %t = load i32, ptr inttoptr (i32 add (i32 ptrtoint (ptr @mylabel to i32), i32 -4) to ptr), align 4
+ ret i32 %t
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_add_store_2:
+; CHECK: i32.const mylabel
+; CHECK-NEXT: i32.const -4
+; CHECK-NEXT: i32.add
+; CHECK-NEXT: i32.const 5
+; CHECK-NEXT: i32.store 0
+define void @dont_fold_non_nuw_add_store_2() {
+ store i32 5, ptr inttoptr (i32 add (i32 ptrtoint (ptr @mylabel to i32), i32 -4) to ptr), align 4
+ ret void
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_sub_load:
+; CHECK: local.get 0
+; CHECK-NEXT: i32.const 2147483644
+; CHECK-NEXT: i32.add
+; CHECK-NEXT: i32.load 0
+define i32 @dont_fold_non_nuw_sub_load(ptr %p) {
+ %q = ptrtoint ptr %p to i32
+ %r = sub i32 %q, -2147483644
+ %s = inttoptr i32 %r to ptr
+ %t = load i32, ptr %s
+ ret i32 %t
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_sub_store:
+; CHECK: local.get 0
+; CHECK-NEXT: i32.const 2147483644
+; CHECK-NEXT: i32.add
+; CHECK-NEXT: i32.const 5
+; CHECK-NEXT: i32.store 0
+define void @dont_fold_non_nuw_sub_store(ptr %p) {
+ %q = ptrtoint ptr %p to i32
+ %r = sub i32 %q, -2147483644
+ %s = inttoptr i32 %r to ptr
+ store i32 5, ptr %s
+ ret void
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_sub_load_2:
+; CHECK: i32.const mylabel
+; CHECK-NEXT: i32.const -4
+; CHECK-NEXT: i32.add
+; CHECK-NEXT: i32.load 0
+define i32 @dont_fold_non_nuw_sub_load_2() {
+ %t = load i32, ptr inttoptr (i32 sub (i32 ptrtoint (ptr @mylabel to i32), i32 4) to ptr), align 4
+ ret i32 %t
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_sub_store_2:
+; CHECK: i32.const mylabel
+; CHECK-NEXT: i32.const -4
+; CHECK-NEXT: i32.add
+; CHECK-NEXT: i32.const 5
+; CHECK-NEXT: i32.store 0
+define void @dont_fold_non_nuw_sub_store_2() {
+ store i32 5, ptr inttoptr (i32 sub (i32 ptrtoint (ptr @mylabel to i32), i32 4) to ptr), align 4
+ ret void
+}
diff --git a/llvm/test/CodeGen/WebAssembly/fast-isel-pr47040.ll b/llvm/test/CodeGen/WebAssembly/fast-isel-pr47040.ll
index 6a1304cb9a93a8..75cb5b66b3ebee 100644
--- a/llvm/test/CodeGen/WebAssembly/fast-isel-pr47040.ll
+++ b/llvm/test/CodeGen/WebAssembly/fast-isel-pr47040.ll
@@ -14,7 +14,7 @@ target triple = "wasm32-unknown-unknown"
define i32 @foo() {
%stack_addr = alloca i32
%stack_i = ptrtoint ptr %stack_addr to i32
- %added = add i32 %stack_i, undef
+ %added = add nuw i32 %stack_i, undef
%added_addr = inttoptr i32 %added to ptr
%ret = load i32, ptr %added_addr
ret i32 %ret
|
@@ -14,7 +14,7 @@ target triple = "wasm32-unknown-unknown" | |||
define i32 @foo() { | |||
%stack_addr = alloca i32 | |||
%stack_i = ptrtoint ptr %stack_addr to i32 | |||
%added = add i32 %stack_i, undef | |||
%added = add nuw i32 %stack_i, undef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is supposed to be folded. So I added nuw
; CHECK-LABEL: dont_fold_non_nuw_sub_load_2: | ||
; CHECK: i32.const mylabel | ||
; CHECK-NEXT: i32.const -4 | ||
; CHECK-NEXT: i32.add | ||
; CHECK-NEXT: i32.load 0 | ||
define i32 @dont_fold_non_nuw_sub_load_2() { | ||
%t = load i32, ptr inttoptr (i32 sub (i32 ptrtoint (ptr @mylabel to i32), i32 4) to ptr), align 4 | ||
ret i32 %t | ||
} | ||
|
||
; CHECK-LABEL: dont_fold_non_nuw_sub_store_2: | ||
; CHECK: i32.const mylabel | ||
; CHECK-NEXT: i32.const -4 | ||
; CHECK-NEXT: i32.add | ||
; CHECK-NEXT: i32.const 5 | ||
; CHECK-NEXT: i32.store 0 | ||
define void @dont_fold_non_nuw_sub_store_2() { | ||
store i32 5, ptr inttoptr (i32 sub (i32 ptrtoint (ptr @mylabel to i32), i32 4) to ptr), align 4 | ||
ret void | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually these two are correctly compiled even in the current compiler, because in ISD::ADD
, even if RHS is negative, we try to compute each of LHS and RHS separately to see if they can be combined:
llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
Lines 355 to 358 in 6de5305
Address Backup = Addr; | |
if (computeAddress(LHS, Addr) && computeAddress(RHS, Addr)) | |
return true; | |
Addr = Backup; |
But ISD::SUB
handling doesn't have this part. (Because offset calculation is addition)
But I added these tests for symmetry and to ensure if these invariants are well maintained in future.
// We should not fold operands into an offset when 'nuw' (no unsigned wrap) | ||
// is not present, because the address calculation does not wrap. | ||
if (auto *OFBinOp = dyn_cast<OverflowingBinaryOperator>(U)) | ||
if (!OFBinOp->hasNoUnsignedWrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any sense of how often this happens, i.e. how often these address-calculation adds lack nuw
? It would be a bummer if we bailed out of fast-isel for the majority of address-calculations (since bailing out forces the whole block to be re-selected, it could potentially regress compile time noticeably).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume this will be frontend-dependent. For C/C++ input, it should be rare because address arithmetic is all inbounds
GEPs. For the input we have, it will be very common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this pattern of code assert(false)
in FastISel and ran emscripten core0
tests with it, and the only case this pattern was generated is in asan tests.
This part in the normal ISel is hit a lot more because many other instructions, including getelementptr
s, are turned into ISD:ADD
in ISel. But this is in the normal ISel and there's no "bailout". (What I told you offline was actually hitting this cases in the normal ISel. I added assert(false)
in both here and the newly added FastISel checks. So this one is not very relevant anyway.)
llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp
Lines 328 to 332 in 6568827
// WebAssembly constant offsets are performed as unsigned with infinite | |
// precision, so we need to check for NoUnsignedWrap so that we don't fold an | |
// offset for an add that needs wrapping. | |
if (N.getOpcode() == ISD::ADD && !N.getNode()->getFlags().hasNoUnsignedWrap()) | |
return false; |
So I guess, as @SingleAccretion pointed out, we don't need to worry too much about -O0 compilation time at least in C/C++. But this may regress -O0 compilation time in C# (or other languages).
Currently our FastISel's selectLoad
and selectStore
only handles the cases where its address can be represented within a single Address
class:
llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
Lines 53 to 110 in db4874c
class Address { | |
public: | |
using BaseKind = enum { RegBase, FrameIndexBase }; | |
private: | |
BaseKind Kind = RegBase; | |
union { | |
unsigned Reg; | |
int FI; | |
} Base; | |
// Whether the base has been determined yet | |
bool IsBaseSet = false; | |
int64_t Offset = 0; | |
const GlobalValue *GV = nullptr; | |
public: | |
// Innocuous defaults for our address. | |
Address() { Base.Reg = 0; } | |
void setKind(BaseKind K) { | |
assert(!isSet() && "Can't change kind with non-zero base"); | |
Kind = K; | |
} | |
BaseKind getKind() const { return Kind; } | |
bool isRegBase() const { return Kind == RegBase; } | |
bool isFIBase() const { return Kind == FrameIndexBase; } | |
void setReg(unsigned Reg) { | |
assert(isRegBase() && "Invalid base register access!"); | |
assert(!IsBaseSet && "Base cannot be reset"); | |
Base.Reg = Reg; | |
IsBaseSet = true; | |
} | |
unsigned getReg() const { | |
assert(isRegBase() && "Invalid base register access!"); | |
return Base.Reg; | |
} | |
void setFI(unsigned FI) { | |
assert(isFIBase() && "Invalid base frame index access!"); | |
assert(!IsBaseSet && "Base cannot be reset"); | |
Base.FI = FI; | |
IsBaseSet = true; | |
} | |
unsigned getFI() const { | |
assert(isFIBase() && "Invalid base frame index access!"); | |
return Base.FI; | |
} | |
void setOffset(int64_t NewOffset) { | |
assert(NewOffset >= 0 && "Offsets must be non-negative"); | |
Offset = NewOffset; | |
} | |
int64_t getOffset() const { return Offset; } | |
void setGlobalValue(const GlobalValue *G) { GV = G; } | |
const GlobalValue *getGlobalValue() const { return GV; } | |
bool isSet() const { return IsBaseSet; } | |
}; |
And the addressing mode is assumed to be either reg-based or frameindex-based. So we can handle cases like reg + label but not two different regs, which we need to handle to fix this bug within FastISel.
While it is not impossible to extend FastISel to handle the two-register cases, I guess that means we can handle a lot more cases that we are currently rejecting, which means even bigger overhaul. It's possible that we only additionally handle the cases in this bug, but we don't really know how frequent these cases are compared to other cases we are already rejecting. Also given that this is a "Fast" ISel after all, I'm not sure whether a bigger extension of the pass is worthwhile.
How about landing this for now, because this is a correctness issue, and see if this is really regressing -O0 compilation times a lot? For that we'd appreciate feedback from C# devs (@SingleAccretion and @yowl).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and see if this is really regressing -O0 compilation times a lot
I can get some PIN (binary instrumentation) data on this change with our input - it should not take a lot of time (mainly building clang itself).
However, before I do that, I have a more basic question: why return false
and not break
? I don't think there is a need to support complex addressing modes with two registers, we can presumably just use the non-nuw
add
as the one and only base register (with a zero offset), which, if I am reading the code correctly, is what break
ing will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! I was misunderstanding the "one reg" rule in FastISel. If FastISel cannot figure out how to make use of the offset, it assigns the whole expression to a single register, which then generates more instructions to make it happen (by adding more registers for its child expressions):
Register Reg = getRegForValue(Obj); |
Anyway, changed them to break
s, and renamed the test file and added -fast-isel-abort=1
because we don't bail out of FastISel anymore. Thanks!
Thanks for picking this up, btw! |
The CI failures don't seem relevant. Merging. |
We should not fold one of add/sub operands into a load/store's offset when
nuw
(no unsigned wrap) is not present, because the address calculation, which adds the offset with the operand, does not wrap.This is handled correctly in the normal ISel:
llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp
Lines 328 to 332 in 6de5305
This positivity check in FastISel is not sufficient to avoid this case fully:
llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
Lines 348 to 352 in 6de5305
because
Address
can contain aGlobalValue
and aReg
at the same time, so theGlobalValue
becomes incorrectly an offset:llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
Lines 53 to 69 in 6de5305
llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
Lines 409 to 417 in 6de5305
Both cases are in the newly added test.
We should handle
SUB
too becauseSUB
is the same asADD
when RHS's sign changes. I checked why our current normal ISel only handlesADD
, and the reason it's OK for the normal ISel to handle onlyADD
seems that DAGCombiner replacesSUB
withADD
here: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L3904-L3907Fixes #111018.