I32.clz_unsafe is really unsafe #2895

fweimer · 2018-10-11T12:20:03Z

Consider this example program:

actor Main
  new create(env: Env) =>
    let array = Array[String]
    array.append(env.args)
    while array.size() < 32 do
      array.push("")
    end
    (let index, let used) = try
         env.args(1)?.read_int[U32]()?
       else
         env.err.print("error: invalid index")
         return
       end
    env.out.print("info: index: " + index.string())
    env.out.print("info: used: " + used.string())
    try
      if array.size() != 32 then
        env.err.print("error: array has wrong size")
        return
      end
      env.out.print(array(magic(index).usize())?)
      env.out.print("info: magic: " + magic(index).string())
      env.out.print("info: no error thrown")
    else
      env.out.print("info: error thrown")
    end      
  fun magic(x: U32): U32 =>
    let y = x.clz_unsafe()
    if y < 32 then
      y
    else
      0
    end

(This can undoubtedly be reduced further, but it will be really brittle.)

When I run it, I get this:

./unsafe 0
Segmentation fault

I believe the reason is this: LLVM treats undefined as really undefined. The bounds check y < 32 is optimized away, and on some CPUs (mine is a Core m7-6Y75), when compiling natively, clz_unsafe is implemented as lzcnt, which yields 32 for a zero input value. The subsequent code gets confused by this because according to LLVM semantics, the value 32 is impossible here. However, this is not an LLVM bug—it does what the Pony compiler instructs it to do.

My compiler:

0.24.4-fff531cf [release]
compiled with: llvm 3.9.1 -- cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Defaults: pic=true ssl=openssl_1.1.0

The reproducer is probably brittle, so unless you have the same environment, it will not work.

The same issue applies to all unsafe arithmetic operations, with different reproducers. (clz_unsafe was literally the first one I tried.) I think it would be reasonable to simply remove all the unsafe arithmetic methods because the performance benefits will be extremely narrow.

The text was updated successfully, but these errors were encountered:

fweimer · 2018-10-11T12:43:37Z

On a different CPU (Core i7-980X), again with a native build, it's more obvious what is going on. The disassembly looks like this:

   0x0000555555558925 <+773>:   callq  0x55555555c100 <pony_sendv>
   0x000055555555892a <+778>:   cmp    $0x20,%r13
   0x000055555555892e <+782>:   jne    0x555555558aa0 <Main_Dispatch+1152>
   0x0000555555558934 <+788>:   bsr    0xc(%rsp),%ebp
=> 0x0000555555558939 <+793>:   xor    $0x1f,%ebp
   0x000055555555893c <+796>:   mov    0x28(%rsp),%rax
   0x0000555555558941 <+801>:   mov    (%rax,%rbp,8),%r14

This CPU does not update the destination register when bsr is executed with an input operand which is zero. After setting a breakpoint on the indicated instruction, I get:

(gdb) print/x $ebp
$1 = 0xf6ce97a0
(gdb) print/x  *(unsigned *)($rsp+0xc)
$2 = 0x0
(gdb) print $rip
$3 = (void (*)()) 0x555555558939 <Main_Dispatch+793>

So the generated code immediately performs an invalid pointer read.

jemc · 2018-10-11T14:23:39Z

@Praetonus, do you have any thoughts on this?

fweimer · 2018-10-24T06:38:26Z

What's the status here? I'm a bit surprised by the lack of reaction, considering that this is a memory safety issue.

mfelsche · 2018-10-24T07:23:38Z

Unfortunately the last two sync calls had to be cancelled. We usually discuss, update and act on issues like this during sync. Also, during the last months most core contributors have been very busy with other stuff (like day-work) unfortunately, so not only this issue did not receive the treatment it deserves.

I myself am not very familiar with those parts of llvm or the assembly that is generated from it. The reason for having those unsafe operations is indeed a performance advantage over the safe variants afaik. If your reproducer would have shown only some undefined behaviour with some surprising result value, that would have been fine by me, but a segfault is a whole lot of a different story to me. I am not against removing them, but this is a decision that should be made ideally by the full core team.

Praetonus · 2018-10-31T17:26:32Z

Ideally, this shouldn't result in a segmentation fault. I say "ideally" because LLVM currently doesn't provide the framework that we'd need for ideal semantics. See this RFC for the semantics that I'd like to see in the language (note that the user-exposed side of the RFC (i.e. \unsafe-x\) will likely not be part of the final RFC, but the semantics of unsafe as exposed there should be final), and this LLVM email for an example of what could change in LLVM to allow us to provide those semantics. There has been no consensus in the LLVM community on that proposal, and according to more recent information another proposal is in the works.

Now while the current situation isn't ideal, I don't think it is bad. Unsafe operations are designed to be used when you can prove by some other means that your operation is safe. For example, when you have some pre-processing that is guaranteed to produce values in a given range. In your case, you should use the safe clz.

I think it would be reasonable to simply remove all the unsafe arithmetic methods because the performance benefits will be extremely narrow.

I don't think it would be a good solution. If your unsafe operation is in a cold path I agree that the gain will be extremely minor. But if it is in the middle of an extremely hot path (for example, a small loop that gets executed several thousands time per seconds) then you might see some improvements. The workflow when dealing with unsafe operation should be to use safe operations by default, and to only consider using an unsafe operation when your benchmarks show that it would result in a significant performance gain and when you can prove by some other means that the operation is safe.

jemc · 2018-10-31T22:07:18Z

I was waiting for @Praetonus to chime in to see if he had ideas about fixing this.

If this can't be fixed, I would advocate for removing the affected operations from the standard library (and also removing the syntax sugar operators), since memory safety is an important guarantee for Pony. As far as I'm concerned, the only time we're allowed to drop the memory safety guarantee is when the user invokes FFI. Any standard library code that invokes FFI internally should be wrapped in such a way that we can guarantee that no user invocation of those abstractions should trigger memory unsafety problems - the rest of the standard library is set up very carefully to keep this held true.

I don't doubt that these unsafe operations have value for performance optimizations in the hot path, but you can still make those optimizations by invoking the LLVM intrinsics via FFI yourself. For a more convenient approach, my recommendation would be to make a third-party library package available that provides convenient wrappers for the memory-unsafe FFI invocations and gives obvious warnings about what you're expected to enforce in your code.

If we can keep some of the value-unsafe operators that don't suffer memory safety / crash issues, I'm fine with keeping those around in the standard library - I just want to cast out anything that has this issue.

Praetonus · 2018-11-02T16:56:51Z

An implementation as a third-party package won't be possible. The code generation of most unsafe operations requires compiler support since it uses LLVM instructions, not intrinsic functions. If we really want to remove them from the standard library, then we'd need to come up with a solution to allow specific non-builtin packages to use compiler_intrinsic.

I think a good compromise would be to remove the operator sugar and to stress out in the documentation that if you're writing unsafe in your code, you should really know what you're doing.

jemc · 2018-11-02T22:25:05Z

Hmm... I suggested it because I saw clz_unsafe uses an LLVM intrinsic, so I assumed the rest did as well.

Could we maybe do simulated LLVM intrinsics for the other LLVM instructions, in such a way that they are parsed and treated like FFI calls, but we special-case those specific strings during codegen to use the instruction instead of an intrinsic?

Something like:

@"unsafe.add.i8"[I8](a, b)

Interestingly, using faux-FFI in this way would also give us a convention to use to let folks who "know what they're doing" do no-overhead pointer math:

@"unsafe.ptradd"[Pointer[U8]](ptr, offset)

All while still adhering to the existing law that Pony is memory-safe until you start using the FFI syntax in your code. And we can use the existing package-whitelisting features to control which packages are allowed to do these unsafe things.

I might file an RFC about this.

jemc · 2018-11-02T22:32:39Z

It's worth noting that I toyed with these concepts a bit before (using real FFI instead of faux-FFI) - I created an unsafe package to demonstrate (ab)using FFI to arbitrarily change a ref cap:
https://github.com/jemc/pony-unsafe/blob/b940a5475536b814ecb905327d49bfd57982a1ac/lib/pony-unsafe.c#L12-L14
https://github.com/jemc/pony-unsafe/blob/b940a5475536b814ecb905327d49bfd57982a1ac/unsafe/unsafe.pony#L15-L16

The bummer was that this kind of unsafe escape hatch could never be zero-cost in terms of performance without being built into the compiler.

Doing the faux-FFI unsafe.* faux-intrinsics is an interesting solution that opens up those possibilities without breaking the fundamental memory-safety guarantees of the vanilla language.

fweimer · 2018-11-11T19:19:29Z

For some other unsafe LLVM opcodes, it may be possible to get deterministic but unspecified results by emulating what the target does at the LLVM level. For example, if the target reduces 32-bit shifts modulo 32, you can generate code to do that, and LLVM will optimize the masking operation away if this is performed implicitly by the hardware. But for CLZ, that option doesn't exist, as far as I can tell.

There are probably some unsafe operations where there is a performance impact from safe behavior, given the current state of LLVM (conversions from floating point to integers come to my mind). But for CLZ, I find that hard to believe because the precondition is easy to check, and in many cases, the compiler can determine the result anyway. On x86, performance of the instruction does not seem to be great anyway and a dynamic check (essentially a constant load plus CMOV) would mostly contribute to icache footprint.

jemc added the needs discussion during sync label Oct 11, 2018

jemc removed the needs discussion during sync label Nov 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I32.clz_unsafe is really unsafe #2895

I32.clz_unsafe is really unsafe #2895

fweimer commented Oct 11, 2018 •

edited

Loading

fweimer commented Oct 11, 2018

jemc commented Oct 11, 2018

fweimer commented Oct 24, 2018

mfelsche commented Oct 24, 2018

Praetonus commented Oct 31, 2018

jemc commented Oct 31, 2018 •

edited

Loading

Praetonus commented Nov 2, 2018

jemc commented Nov 2, 2018

jemc commented Nov 2, 2018 •

edited

Loading

fweimer commented Nov 11, 2018

I32.clz_unsafe is *really* unsafe #2895

I32.clz_unsafe is *really* unsafe #2895

Comments

fweimer commented Oct 11, 2018 • edited Loading

fweimer commented Oct 11, 2018

jemc commented Oct 11, 2018

fweimer commented Oct 24, 2018

mfelsche commented Oct 24, 2018

Praetonus commented Oct 31, 2018

jemc commented Oct 31, 2018 • edited Loading

Praetonus commented Nov 2, 2018

jemc commented Nov 2, 2018

jemc commented Nov 2, 2018 • edited Loading

fweimer commented Nov 11, 2018

I32.clz_unsafe is really unsafe #2895

I32.clz_unsafe is really unsafe #2895

fweimer commented Oct 11, 2018 •

edited

Loading

jemc commented Oct 31, 2018 •

edited

Loading

jemc commented Nov 2, 2018 •

edited

Loading