Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help wanted: beam_ssa_recv failed assertion #8733

Closed
drathier opened this issue Aug 17, 2024 · 6 comments · Fixed by #8824
Closed

Help wanted: beam_ssa_recv failed assertion #8733

drathier opened this issue Aug 17, 2024 · 6 comments · Fixed by #8824
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@drathier
Copy link

drathier commented Aug 17, 2024

Describe the bug
I'd like help debugging this; all pointers are welcome! I think the crash is triggered by elixir tooling, but the stack trace and crash is in the erlang compiler. I'm using elixir iex to run the code, but the file that triggers this is written in erlang.

I've copied the otp supervisor.erl module to make some trial modifications. The file lives in ./erlang/supervisor.erl.

I've managed to sometimes fail an assertion in beam_ssa_recv. It seems to be this line:

true = beam_ssa:no_side_effect(I), %Assertion.

To Reproduce
I don't yet know how to reproduce it, or where it should be fixed.

  • Building the erl file with erlc works fine
  • The first compilation of the file using iex works fine
  • Recompiling the file using iex triggers this error:
Compiling 1 file (.erl)
erlang/supervisor2.erl: internal error in pass beam_ssa_recv:
exception error: no match of right hand side value false
  in function  beam_ssa_recv:pu_is_ref_used_is/2 (beam_ssa_recv.erl, line 632)
  in call from beam_ssa_recv:pu_is_ref_used_in_1/3 (beam_ssa_recv.erl, line 596)
  in call from beam_ssa_recv:pu_ref_used_in/2 (beam_ssa_recv.erl, line 580)
  in call from beam_ssa_recv:pu_is_ref_used/4 (beam_ssa_recv.erl, line 556)
  in call from lists:search_1/2 (lists.erl, line 1776)
  in call from beam_ssa_recv:plan_uses_1/4 (beam_ssa_recv.erl, line 520)
  in call from beam_ssa_recv:'-plan_uses/3-anonymous-0-'/5 (beam_ssa_recv.erl, line 511)
  in call from maps:fold_1/4 (maps.erl, line 416)
  in call from beam_ssa_recv:plan/1 (beam_ssa_recv.erl, line 430)
  in call from beam_ssa_recv:module/2 (beam_ssa_recv.erl, line 136)
  in call from compile:'-select_passes/2-anonymous-0-'/3 (compile.erl, line 683)
  in call from compile:fold_comp/4 (compile.erl, line 410)
  in call from compile:internal_comp/5 (compile.erl, line 394)
  in call from compile:'-internal_fun/2-anonymous-0-'/2 (compile.erl, line 227)
  in call from compile:'-do_compile/2-anonymous-0-'/1 (compile.erl, line 217)

which seems to be this line:

true = beam_ssa:no_side_effect(I), %Assertion.

  • I'm not including the file itself, because I cannot really reproduce the issue reliably yet. I'm not fully convinced that the problem is in this file at all. Compling the file by itself seems to work no problem, and I can't upload the entire project. All pointers are welcome!

Expected behavior
I don't expect the assertion to fail.

Affected versions
Erlang/OTP 26 [erts-14.2.5] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit] [dtrace], seemingly 26.2.5?

Additional context
Add any other context about the problem here. If you wish to attach Erlang code you can either write it directly in the post using code tags, create a gist, or attach it as a zip file to this post.

@drathier drathier added the bug Issue is reported as a bug label Aug 17, 2024
@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Aug 19, 2024
@jhogberg
Copy link
Contributor

jhogberg commented Aug 19, 2024

Thanks for your report! This is definitely a bug. One thing you can do is add a try/catch in beam_ssa_recv:module/2 that dumps the #b_module{} record, that'll give us the offending file as seen by the pass.

@drathier
Copy link
Author

Status update. The crash is when it checks pu_is_ref_used at the Mon in {'DOWN', Mon, process, Pid, Reason0} -> on the line "5" in this snippet:

shutdown(#child{pid=Pid, shutdown=Time} = Child) ->
    Mon = monitor(process, Pid),
    exit(Pid, shutdown),
    receive
        {'DOWN', Mon, process, Pid, Reason0} ->                      %%% <--- here
            case unlink_flush(Pid, Reason0) of
                shutdown ->
                    ok;
                {shutdown, _} when not (?is_permanent(Child)) ->
                    ok;
                normal when not (?is_permanent(Child)) ->
                    ok;
                Reason ->
                    {error, Reason}
            end
    after Time ->
        exit(Pid, kill),
        receive
            {'DOWN', Mon, process, Pid, Reason0} ->
                case unlink_flush(Pid, Reason0) of
                    shutdown ->
                        ok;
                    {shutdown, _} when not (?is_permanent(Child)) ->
                        ok;
                    normal when not (?is_permanent(Child)) ->
                        ok;
                    Reason ->
                        {error, Reason}
                end
        end
    end.

This code is verbatim copied from https://github.com/erlang/otp/blob/master/lib/stdlib/src/supervisor.erl#L1495-L1525 . The probability of nobody finding this before me in such a core module seems incredibly low, so I bet something else is up.

Repro steps

Reproduction steps, on my machine only, where asdf and qwer are placeholder names:

/opt/homebrew/Cellar/erlang/26.2.5/lib/erlang/lib/compiler-8.4.3/ebin$ /opt/homebrew/Cellar/erlang/26.2.1/lib/erlang/bin/erlc ../src/beam_ssa_recv.erl; erlc ../src/beam_ssa_recv.erl

~/asdf/code/qwer$ PURERLEX_VERBOSE=1 iex --dbg pry -S mix phx.server

rm -r /Users/drathier/asdf/code/qwer/_build/dev/lib/asdf

`recompile` in iex

hopefully trigger bug, otherwise redo all previous steps

Small repro

Here's a smaller version that's enough to trigger the bug:

-module(supervisor2).

asdf(Pid) ->
    Mon = monitor(process, Pid),
    receive
        {'DOWN', Mon, process, Pid, _} ->
            ok
    end.
  • {'DOWN', Mon, process, Pid, _} -> does not compile, reproduces the above error
  • {'DOWN', Mon2, process, Pid, _} -> compiles ok
  • {'DOWN', Mon, process, Pid2, _} -> compiles ok
  • {'DOWN', Mon2, process, Pid2, _} -> compiles ok

Questions and Next steps

  • Q1: Is this compiler pass skipping some modules by name, to avoid known issues in very rarely written code? I can reproduce this bug with this exact version https://raw.githubusercontent.com/erlang/otp/OTP-26.2.5.3/lib/stdlib/src/supervisor.erl and just the -module(supervisor2). line changed. This is the only explanation I can think of right now as for why I would be the first one triggering this bug.
  • Q2: Is this worth investigating further? Is there anything you'd like me to dig into further? I don't know what the next steps debugging this would be.

@jhogberg
Copy link
Contributor

Thanks, that is very odd. Am I right in assuming that this runs on an ARM Mac? Can you reproduce this if you build OTP without the JIT? (./otp_build --disable-jit)

@jhogberg
Copy link
Contributor

Also, if you could post the raw #b_module{} somewhere that'd be great too. I've been unable to reproduce this on the machines we have, and would like to know whether the #b_module{} looks like it should at this point, whether there's an issue with true = beam_ssa:no_side_effect(I),, beam_ssa:no_side_effect/1 being broken, etc.

@drathier
Copy link
Author

drathier commented Sep 18, 2024

  • I'm using an m2 arm mac, yes.
  • I just realised the ERL_COMPILER_OPTIONS="[no_bool_opt]" was set in an .env file, and that it's seemingly required to trigger the small repro version of this bug.

supervisor2.erl had the contents of the small repro posted above, and I got these outputs:
https://gist.github.com/drathier/cc2770e83416d3e435683264b574c7ca
EDIT: I modified the code to grab the stack traces too, but didn't re-run the tests after adding them. Lmk if you want the stack traces too :)

I hope this is enough for you to reproduce this bug, or to dismiss it as wontfix. I don't know if you officially support disabling compilation steps. I know I've hit compiler bug(s) before from skipping compiler optimization passes, possibly this same one, don't remember. Thanks for rubber ducking this with me, I'm now unblocked again :)

  • This feels like enough of a finding to possibly reproduce it on your machine, so I'll wait and see how it does before I dig any further.
  • I haven't tested the large file
  • I haven't tried without jit
  • I had disabled some compiler passes to speed up compilation sometime earlier (actually ERL_COMPILER_OPTIONS="[no_bool_opt,no_stack_trimming]") and left it in an .env file and forgotten about it :/

@jhogberg
Copy link
Contributor

Thanks! I can reproduce it just fine now, it's a benign bug in code generation: it crashes on a succeeded:body check that should've been generated as a succeeded:guard. It should have no ill effects other than suppressing some optimizations or crashing on assertions like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants