Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebAssembly] Don't fold non-nuw add/sub in FastISel #111278

Merged
merged 4 commits into from
Oct 9, 2024

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Oct 6, 2024

We should not fold one of add/sub operands into a load/store's offset when nuw (no unsigned wrap) is not present, because the address calculation, which adds the offset with the operand, does not wrap.

This is handled correctly in the normal ISel:

// WebAssembly constant offsets are performed as unsigned with infinite
// precision, so we need to check for NoUnsignedWrap so that we don't fold an
// offset for an add that needs wrapping.
if (N.getOpcode() == ISD::ADD && !N.getNode()->getFlags().hasNoUnsignedWrap())
return false;
but not in FastISel.

This positivity check in FastISel is not sufficient to avoid this case fully:

uint64_t TmpOffset = Addr.getOffset() + CI->getSExtValue();
if (int64_t(TmpOffset) >= 0) {
Addr.setOffset(TmpOffset);
return computeAddress(LHS, Addr);
}

because

  1. Even if RHS is within signed int range, depending on the value of the LHS, the resulting value can exceed uint32 max.
  2. When one of the operands is a label, Address can contain a GlobalValue and a Reg at the same time, so the GlobalValue becomes incorrectly an offset:
    class Address {
    public:
    using BaseKind = enum { RegBase, FrameIndexBase };
    private:
    BaseKind Kind = RegBase;
    union {
    unsigned Reg;
    int FI;
    } Base;
    // Whether the base has been determined yet
    bool IsBaseSet = false;
    int64_t Offset = 0;
    const GlobalValue *GV = nullptr;
    if (const GlobalValue *GV = Addr.getGlobalValue())
    MIB.addGlobalAddress(GV, Addr.getOffset());
    else
    MIB.addImm(Addr.getOffset());
    if (Addr.isRegBase())
    MIB.addReg(Addr.getReg());
    else
    MIB.addFrameIndex(Addr.getFI());

Both cases are in the newly added test.

We should handle SUB too because SUB is the same as ADD when RHS's sign changes. I checked why our current normal ISel only handles ADD, and the reason it's OK for the normal ISel to handle only ADD seems that DAGCombiner replaces SUB with ADD here: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L3904-L3907

Fixes #111018.

We should not fold one of add/sub operands into a load/store's offset
when `nuw` (no unsigned wrap) is not present, because the address
calculation, which adds the offset with the operand, does not wrap.

This is handled correctly in the normal ISel:
https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp#L328-L332
but not in FastISel.

This positivity check in FastISel is not sufficient to avoid this case
fully:
https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L348-L352

because
1. Even if RHS is within signed int range, depending on the value of the
   LHS, the resulting value can exceed uint32 max.
2. When one of the operands is a label, `Address` can contain a
   `GlobalValue` and a `Reg` at the same time, so the `GlobalValue`
   becomes incorrectly an offset:
  https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L53-L69
  https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L409-L417

Both cases are in the newly added test.

This PR bails out of FastISel when there is an add/sub without a `nuw`.

We should handle `SUB` too because `SUB` is the same  as `ADD` when
RHS's sign changes. I checked why our current normal ISel only handles
`ADD`, and the reason it's OK for the normal ISel to handle only `ADD`
seems that DAGCombiner replaces `SUB` with `ADD` here:
https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L3904-L3907

Fixes llvm#111018.
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 6, 2024

@llvm/pr-subscribers-backend-webassembly

Author: Heejin Ahn (aheejin)

Changes

We should not fold one of add/sub operands into a load/store's offset when nuw (no unsigned wrap) is not present, because the address calculation, which adds the offset with the operand, does not wrap.

This is handled correctly in the normal ISel:

// WebAssembly constant offsets are performed as unsigned with infinite
// precision, so we need to check for NoUnsignedWrap so that we don't fold an
// offset for an add that needs wrapping.
if (N.getOpcode() == ISD::ADD && !N.getNode()->getFlags().hasNoUnsignedWrap())
return false;
but not in FastISel.

This positivity check in FastISel is not sufficient to avoid this case fully:

uint64_t TmpOffset = Addr.getOffset() + CI->getSExtValue();
if (int64_t(TmpOffset) >= 0) {
Addr.setOffset(TmpOffset);
return computeAddress(LHS, Addr);
}

because

  1. Even if RHS is within signed int range, depending on the value of the LHS, the resulting value can exceed uint32 max.
  2. When one of the operands is a label, Address can contain a GlobalValue and a Reg at the same time, so the GlobalValue becomes incorrectly an offset:
    class Address {
    public:
    using BaseKind = enum { RegBase, FrameIndexBase };
    private:
    BaseKind Kind = RegBase;
    union {
    unsigned Reg;
    int FI;
    } Base;
    // Whether the base has been determined yet
    bool IsBaseSet = false;
    int64_t Offset = 0;
    const GlobalValue *GV = nullptr;
    if (const GlobalValue *GV = Addr.getGlobalValue())
    MIB.addGlobalAddress(GV, Addr.getOffset());
    else
    MIB.addImm(Addr.getOffset());
    if (Addr.isRegBase())
    MIB.addReg(Addr.getReg());
    else
    MIB.addFrameIndex(Addr.getFI());

Both cases are in the newly added test.

This PR bails out of FastISel when there is an add/sub without a nuw.

We should handle SUB too because SUB is the same as ADD when RHS's sign changes. I checked why our current normal ISel only handles ADD, and the reason it's OK for the normal ISel to handle only ADD seems that DAGCombiner replaces SUB with ADD here: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L3904-L3907

Fixes #111018.


Full diff: https://github.com/llvm/llvm-project/pull/111278.diff

3 Files Affected:

  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp (+12)
  • (added) llvm/test/CodeGen/WebAssembly/fast-isel-bailout.ll (+106)
  • (modified) llvm/test/CodeGen/WebAssembly/fast-isel-pr47040.ll (+1-1)
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
index 317c6463985dcb..b004192eed6dcc 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp
@@ -337,6 +337,12 @@ bool WebAssemblyFastISel::computeAddress(const Value *Obj, Address &Addr) {
     break;
   }
   case Instruction::Add: {
+    // We should not fold operands into an offset when 'nuw' (no unsigned wrap)
+    // is not present, because the address calculation does not wrap.
+    if (auto *OFBinOp = dyn_cast<OverflowingBinaryOperator>(U))
+      if (!OFBinOp->hasNoUnsignedWrap())
+        return false;
+
     // Adds of constants are common and easy enough.
     const Value *LHS = U->getOperand(0);
     const Value *RHS = U->getOperand(1);
@@ -360,6 +366,12 @@ bool WebAssemblyFastISel::computeAddress(const Value *Obj, Address &Addr) {
     break;
   }
   case Instruction::Sub: {
+    // We should not fold operands into an offset when 'nuw' (no unsigned wrap)
+    // is not present, because the address calculation does not wrap.
+    if (auto *OFBinOp = dyn_cast<OverflowingBinaryOperator>(U))
+      if (!OFBinOp->hasNoUnsignedWrap())
+        return false;
+
     // Subs of constants are common and easy enough.
     const Value *LHS = U->getOperand(0);
     const Value *RHS = U->getOperand(1);
diff --git a/llvm/test/CodeGen/WebAssembly/fast-isel-bailout.ll b/llvm/test/CodeGen/WebAssembly/fast-isel-bailout.ll
new file mode 100644
index 00000000000000..80c5c31718e744
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/fast-isel-bailout.ll
@@ -0,0 +1,106 @@
+; RUN: llc < %s -asm-verbose=false -fast-isel -verify-machineinstrs | FileCheck %s
+
+target triple = "wasm32-unknown-unknown"
+
+; FastISel should not fold one of the add/sub operands into a load/store's
+; offset when 'nuw' (no unsigned wrap) is not present, because the address
+; calculation does not wrap. When there is an add/sub and nuw is not present, we
+; bail out of FastISel.
+
+@mylabel = external global ptr
+
+; CHECK-LABEL: dont_fold_non_nuw_add_load:
+; CHECK:       local.get  0
+; CHECK-NEXT:  i32.const  2147483644
+; CHECK-NEXT:  i32.add
+; CHECK-NEXT:  i32.load  0
+define i32 @dont_fold_non_nuw_add_load(ptr %p) {
+  %q = ptrtoint ptr %p to i32
+  %r = add i32 %q, 2147483644
+  %s = inttoptr i32 %r to ptr
+  %t = load i32, ptr %s
+  ret i32 %t
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_add_store:
+; CHECK:       local.get  0
+; CHECK-NEXT:  i32.const  2147483644
+; CHECK-NEXT:  i32.add
+; CHECK-NEXT:  i32.const  5
+; CHECK-NEXT:  i32.store  0
+define void @dont_fold_non_nuw_add_store(ptr %p) {
+  %q = ptrtoint ptr %p to i32
+  %r = add i32 %q, 2147483644
+  %s = inttoptr i32 %r to ptr
+  store i32 5, ptr %s
+  ret void
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_add_load_2:
+; CHECK:       i32.const  mylabel
+; CHECK-NEXT:  i32.const  -4
+; CHECK-NEXT:  i32.add
+; CHECK-NEXT:  i32.load  0
+define i32 @dont_fold_non_nuw_add_load_2() {
+  %t = load i32, ptr inttoptr (i32 add (i32 ptrtoint (ptr @mylabel to i32), i32 -4) to ptr), align 4
+  ret i32 %t
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_add_store_2:
+; CHECK:       i32.const  mylabel
+; CHECK-NEXT:  i32.const  -4
+; CHECK-NEXT:  i32.add
+; CHECK-NEXT:  i32.const  5
+; CHECK-NEXT:  i32.store  0
+define void @dont_fold_non_nuw_add_store_2() {
+  store i32 5, ptr inttoptr (i32 add (i32 ptrtoint (ptr @mylabel to i32), i32 -4) to ptr), align 4
+  ret void
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_sub_load:
+; CHECK:       local.get  0
+; CHECK-NEXT:  i32.const  2147483644
+; CHECK-NEXT:  i32.add
+; CHECK-NEXT:  i32.load  0
+define i32 @dont_fold_non_nuw_sub_load(ptr %p) {
+  %q = ptrtoint ptr %p to i32
+  %r = sub i32 %q, -2147483644
+  %s = inttoptr i32 %r to ptr
+  %t = load i32, ptr %s
+  ret i32 %t
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_sub_store:
+; CHECK:       local.get  0
+; CHECK-NEXT:  i32.const  2147483644
+; CHECK-NEXT:  i32.add
+; CHECK-NEXT:  i32.const  5
+; CHECK-NEXT:  i32.store  0
+define void @dont_fold_non_nuw_sub_store(ptr %p) {
+  %q = ptrtoint ptr %p to i32
+  %r = sub i32 %q, -2147483644
+  %s = inttoptr i32 %r to ptr
+  store i32 5, ptr %s
+  ret void
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_sub_load_2:
+; CHECK:       i32.const  mylabel
+; CHECK-NEXT:  i32.const  -4
+; CHECK-NEXT:  i32.add
+; CHECK-NEXT:  i32.load  0
+define i32 @dont_fold_non_nuw_sub_load_2() {
+  %t = load i32, ptr inttoptr (i32 sub (i32 ptrtoint (ptr @mylabel to i32), i32 4) to ptr), align 4
+ ret i32 %t
+}
+
+; CHECK-LABEL: dont_fold_non_nuw_sub_store_2:
+; CHECK:       i32.const  mylabel
+; CHECK-NEXT:  i32.const  -4
+; CHECK-NEXT:  i32.add
+; CHECK-NEXT:  i32.const  5
+; CHECK-NEXT:  i32.store  0
+define void @dont_fold_non_nuw_sub_store_2() {
+  store i32 5, ptr inttoptr (i32 sub (i32 ptrtoint (ptr @mylabel to i32), i32 4) to ptr), align 4
+  ret void
+}
diff --git a/llvm/test/CodeGen/WebAssembly/fast-isel-pr47040.ll b/llvm/test/CodeGen/WebAssembly/fast-isel-pr47040.ll
index 6a1304cb9a93a8..75cb5b66b3ebee 100644
--- a/llvm/test/CodeGen/WebAssembly/fast-isel-pr47040.ll
+++ b/llvm/test/CodeGen/WebAssembly/fast-isel-pr47040.ll
@@ -14,7 +14,7 @@ target triple = "wasm32-unknown-unknown"
 define i32 @foo() {
   %stack_addr = alloca i32
   %stack_i = ptrtoint ptr %stack_addr to i32
-  %added = add i32 %stack_i, undef
+  %added = add nuw i32 %stack_i, undef
   %added_addr = inttoptr i32 %added to ptr
   %ret = load i32, ptr %added_addr
   ret i32 %ret

@@ -14,7 +14,7 @@ target triple = "wasm32-unknown-unknown"
define i32 @foo() {
%stack_addr = alloca i32
%stack_i = ptrtoint ptr %stack_addr to i32
%added = add i32 %stack_i, undef
%added = add nuw i32 %stack_i, undef
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is supposed to be folded. So I added nuw

Comment on lines 87 to 106
; CHECK-LABEL: dont_fold_non_nuw_sub_load_2:
; CHECK: i32.const mylabel
; CHECK-NEXT: i32.const -4
; CHECK-NEXT: i32.add
; CHECK-NEXT: i32.load 0
define i32 @dont_fold_non_nuw_sub_load_2() {
%t = load i32, ptr inttoptr (i32 sub (i32 ptrtoint (ptr @mylabel to i32), i32 4) to ptr), align 4
ret i32 %t
}

; CHECK-LABEL: dont_fold_non_nuw_sub_store_2:
; CHECK: i32.const mylabel
; CHECK-NEXT: i32.const -4
; CHECK-NEXT: i32.add
; CHECK-NEXT: i32.const 5
; CHECK-NEXT: i32.store 0
define void @dont_fold_non_nuw_sub_store_2() {
store i32 5, ptr inttoptr (i32 sub (i32 ptrtoint (ptr @mylabel to i32), i32 4) to ptr), align 4
ret void
}
Copy link
Member Author

@aheejin aheejin Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually these two are correctly compiled even in the current compiler, because in ISD::ADD, even if RHS is negative, we try to compute each of LHS and RHS separately to see if they can be combined:

Address Backup = Addr;
if (computeAddress(LHS, Addr) && computeAddress(RHS, Addr))
return true;
Addr = Backup;

But ISD::SUB handling doesn't have this part. (Because offset calculation is addition)

But I added these tests for symmetry and to ensure if these invariants are well maintained in future.

// We should not fold operands into an offset when 'nuw' (no unsigned wrap)
// is not present, because the address calculation does not wrap.
if (auto *OFBinOp = dyn_cast<OverflowingBinaryOperator>(U))
if (!OFBinOp->hasNoUnsignedWrap())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any sense of how often this happens, i.e. how often these address-calculation adds lack nuw? It would be a bummer if we bailed out of fast-isel for the majority of address-calculations (since bailing out forces the whole block to be re-selected, it could potentially regress compile time noticeably).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume this will be frontend-dependent. For C/C++ input, it should be rare because address arithmetic is all inbounds GEPs. For the input we have, it will be very common.

Copy link
Member Author

@aheejin aheejin Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this pattern of code assert(false) in FastISel and ran emscripten core0 tests with it, and the only case this pattern was generated is in asan tests.

This part in the normal ISel is hit a lot more because many other instructions, including getelementptrs, are turned into ISD:ADD in ISel. But this is in the normal ISel and there's no "bailout". (What I told you offline was actually hitting this cases in the normal ISel. I added assert(false) in both here and the newly added FastISel checks. So this one is not very relevant anyway.)

// WebAssembly constant offsets are performed as unsigned with infinite
// precision, so we need to check for NoUnsignedWrap so that we don't fold an
// offset for an add that needs wrapping.
if (N.getOpcode() == ISD::ADD && !N.getNode()->getFlags().hasNoUnsignedWrap())
return false;

So I guess, as @SingleAccretion pointed out, we don't need to worry too much about -O0 compilation time at least in C/C++. But this may regress -O0 compilation time in C# (or other languages).

Currently our FastISel's selectLoad and selectStore only handles the cases where its address can be represented within a single Address class:

class Address {
public:
using BaseKind = enum { RegBase, FrameIndexBase };
private:
BaseKind Kind = RegBase;
union {
unsigned Reg;
int FI;
} Base;
// Whether the base has been determined yet
bool IsBaseSet = false;
int64_t Offset = 0;
const GlobalValue *GV = nullptr;
public:
// Innocuous defaults for our address.
Address() { Base.Reg = 0; }
void setKind(BaseKind K) {
assert(!isSet() && "Can't change kind with non-zero base");
Kind = K;
}
BaseKind getKind() const { return Kind; }
bool isRegBase() const { return Kind == RegBase; }
bool isFIBase() const { return Kind == FrameIndexBase; }
void setReg(unsigned Reg) {
assert(isRegBase() && "Invalid base register access!");
assert(!IsBaseSet && "Base cannot be reset");
Base.Reg = Reg;
IsBaseSet = true;
}
unsigned getReg() const {
assert(isRegBase() && "Invalid base register access!");
return Base.Reg;
}
void setFI(unsigned FI) {
assert(isFIBase() && "Invalid base frame index access!");
assert(!IsBaseSet && "Base cannot be reset");
Base.FI = FI;
IsBaseSet = true;
}
unsigned getFI() const {
assert(isFIBase() && "Invalid base frame index access!");
return Base.FI;
}
void setOffset(int64_t NewOffset) {
assert(NewOffset >= 0 && "Offsets must be non-negative");
Offset = NewOffset;
}
int64_t getOffset() const { return Offset; }
void setGlobalValue(const GlobalValue *G) { GV = G; }
const GlobalValue *getGlobalValue() const { return GV; }
bool isSet() const { return IsBaseSet; }
};

And the addressing mode is assumed to be either reg-based or frameindex-based. So we can handle cases like reg + label but not two different regs, which we need to handle to fix this bug within FastISel.

While it is not impossible to extend FastISel to handle the two-register cases, I guess that means we can handle a lot more cases that we are currently rejecting, which means even bigger overhaul. It's possible that we only additionally handle the cases in this bug, but we don't really know how frequent these cases are compared to other cases we are already rejecting. Also given that this is a "Fast" ISel after all, I'm not sure whether a bigger extension of the pass is worthwhile.

How about landing this for now, because this is a correctness issue, and see if this is really regressing -O0 compilation times a lot? For that we'd appreciate feedback from C# devs (@SingleAccretion and @yowl).

Copy link
Contributor

@SingleAccretion SingleAccretion Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and see if this is really regressing -O0 compilation times a lot

I can get some PIN (binary instrumentation) data on this change with our input - it should not take a lot of time (mainly building clang itself).

However, before I do that, I have a more basic question: why return false and not break? I don't think there is a need to support complex addressing modes with two registers, we can presumably just use the non-nuw add as the one and only base register (with a zero offset), which, if I am reading the code correctly, is what breaking will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! I was misunderstanding the "one reg" rule in FastISel. If FastISel cannot figure out how to make use of the offset, it assigns the whole expression to a single register, which then generates more instructions to make it happen (by adding more registers for its child expressions):

Register Reg = getRegForValue(Obj);

Anyway, changed them to breaks, and renamed the test file and added -fast-isel-abort=1 because we don't bail out of FastISel anymore. Thanks!

@dschuff
Copy link
Member

dschuff commented Oct 7, 2024

Thanks for picking this up, btw!

@aheejin
Copy link
Member Author

aheejin commented Oct 9, 2024

The CI failures don't seem relevant. Merging.

@aheejin aheejin merged commit 115cb40 into llvm:main Oct 9, 2024
7 of 9 checks passed
@aheejin aheejin deleted the fast_isel_fix branch October 9, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[WebAssembly] LLVM IR add does not compile to wrapping Wasm
4 participants