Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal compiler error in ssa_opt_type_continue #7509

Open
frej opened this issue Jul 24, 2023 · 9 comments
Open

Internal compiler error in ssa_opt_type_continue #7509

frej opened this issue Jul 24, 2023 · 9 comments
Assignees
Labels
bug Issue is reported as a bug stalled waiting for input by the Erlang/OTP team team:VM Assigned to OTP team VM

Comments

@frej
Copy link
Contributor

frej commented Jul 24, 2023

Describe the bug

Erlc can sometimes crash with a back-trace like this:

Sub pass ssa_opt_type_continue
Function: '-f3/1-lc$^0/1-3-'/1
test362604.erl: internal error in pass beam_ssa_opt:
exception error: no function clause matching beam_types:join([]) 
  in function  beam_ssa_type:join_arg_types/3 (beam_ssa_type.erl, line 449)
  in call from beam_ssa_type:opt_continue/4 (beam_ssa_type.erl, line 432)
  in call from beam_ssa_opt:ssa_opt_type_continue/1 (beam_ssa_opt.erl, line 449)
  in call from compile:run_sub_passes_1/3 (compile.erl, line 424)
  in call from beam_ssa_opt:phase/4 (beam_ssa_opt.erl, line 116)
  in call from beam_ssa_opt:fixpoint/6 (beam_ssa_opt.erl, line 99)
  in call from beam_ssa_opt:run_phases/3 (beam_ssa_opt.erl, line 85)

To Reproduce
Unless the patch below is applied, the crash appears to be dependent on the phase of the moon :) With the patch applied, the reproducer (found with Erlfuzz, thanks @RobinMorisset ) triggers the error reliably.

diff --git a/lib/compiler/src/beam_ssa_type.erl b/lib/compiler/src/beam_ssa_type.erl
index d492ed4b50..9d8b965bd3 100644
--- a/lib/compiler/src/beam_ssa_type.erl
+++ b/lib/compiler/src/beam_ssa_type.erl
@@ -379,7 +379,7 @@ init_sig_st(StMap, FuncDb) ->
              wl=wl_defer_list(Roots, wl_new()) }.
 
 init_sig_roots(FuncDb) ->
-    [Id || Id := #func_info{exported=true} <- FuncDb].
+    [Id || Id := #func_info{exported=true} <- maps:iterator(FuncDb, ordered)].
 
 init_sig_args([Root | Roots], StMap, Acc) ->
     #opt_st{args=Args0} = map_get(Root, StMap),

Affected versions
At least maint (fe0571e)

Additional context

As far as I can tell, beam_ssa_type:signatures_1/2 can produce a different #sig_st{} and FuncDb depending on the order of the initial worklist in the #sig_st{}. This, in turn, leads to number of list-comprehension functions being eliminated from optimization when the compilation succeeds, but kept around when the crash occurs.

@frej frej added the bug Issue is reported as a bug label Jul 24, 2023
@RobinMorisset
Copy link
Contributor

Thanks for the mention!
This looks like it might be a duplicate of #7478 ?

@jhogberg jhogberg self-assigned this Jul 24, 2023
@jhogberg jhogberg added the team:VM Assigned to OTP team VM label Jul 24, 2023
@frej
Copy link
Contributor Author

frej commented Jul 25, 2023

This looks like it might be a duplicate of #7478 ?

Looks very likely, I wonder what I did wrong when I searched for beam_types:join among the issues yesterday and didn't find this one?

Permuting the order of functions returned by init_sig_roots(FuncDb) does not fix the problem in #7478 , probably because we have only one exported function which calls the list comprehension.

@jhogberg
Copy link
Contributor

Crashing on beam_ssa_type:join_arg_types/3 in this manner means that previously unreachable code has suddenly become reachable, which is not supposed to happen (other way around is fine). The cause is nearly always some silly small thing that accidentally widens a type.

In the absence of bugs like these, the result will be equivalent regardless of work order, so I prefer to leave it undefined to help shake more bugs out. I’ll look deeper into it today. :)

@jhogberg
Copy link
Contributor

It looks like this has the same root cause as #7342 / #6599 :(

@jhogberg jhogberg added the stalled waiting for input by the Erlang/OTP team label Jul 25, 2023
frej added a commit to frej/otp that referenced this issue Jul 26, 2023
The beam_ssa_type:opt_continue/4 pass requires that type information
is available for all arguments or else it will crash with a back trace
similar to this:

Sub pass ssa_opt_type_continue
Function: '-f3/1-lc$^0/1-3-'/1
test362604.erl: internal error in pass beam_ssa_opt:
exception error: no function clause matching beam_types:join([])
  in function  beam_ssa_type:join_arg_types/3 (beam_ssa_type.erl, line 449)
  in call from beam_ssa_type:opt_continue/4 (beam_ssa_type.erl, line 432)
  in call from beam_ssa_opt:ssa_opt_type_continue/1 (beam_ssa_opt.erl, line 449)
  in call from compile:run_sub_passes_1/3 (compile.erl, line 424)
  in call from beam_ssa_opt:phase/4 (beam_ssa_opt.erl, line 116)
  in call from beam_ssa_opt:fixpoint/6 (beam_ssa_opt.erl, line 99)
  in call from beam_ssa_opt:run_phases/3 (beam_ssa_opt.erl, line 85)

If type analysis during beam_ssa_type:opt_start/2 concludes that a
function is never called, type information for the callee will never
be recorded, thus leaving no type information for the arguments in the
function's entry in the function database, leading to the crash above.

Although naively unreachable functions are pruned when the list of
functions is first traversed in beam_ssa_type:opt_start/2, it doesn't
handle they case when type inference concludes that a function is
never called. This patch extends beam_ssa_type:opt_start/2 to, when
all functions have been processed, prune non-exported functions for
which #func_info.arg_types is just a list of empty dictionaries.  That
a function of zero arity is always considered reachable, is harmless
as it won't crash opt_continue/4.

Closes erlang#7509
Closes erlang#7478
frej added a commit to frej/otp that referenced this issue Jul 26, 2023
The beam_ssa_type:opt_continue/4 pass requires that type information
is available for all arguments or else it will crash with a back trace
similar to this:

Sub pass ssa_opt_type_continue
Function: '-f3/1-lc$^0/1-3-'/1
test362604.erl: internal error in pass beam_ssa_opt:
exception error: no function clause matching beam_types:join([])
  in function  beam_ssa_type:join_arg_types/3 (beam_ssa_type.erl, line 449)
  in call from beam_ssa_type:opt_continue/4 (beam_ssa_type.erl, line 432)
  in call from beam_ssa_opt:ssa_opt_type_continue/1 (beam_ssa_opt.erl, line 449)
  in call from compile:run_sub_passes_1/3 (compile.erl, line 424)
  in call from beam_ssa_opt:phase/4 (beam_ssa_opt.erl, line 116)
  in call from beam_ssa_opt:fixpoint/6 (beam_ssa_opt.erl, line 99)
  in call from beam_ssa_opt:run_phases/3 (beam_ssa_opt.erl, line 85)

If type analysis during beam_ssa_type:opt_start/2 concludes that a
function is never called, type information for the callee will never
be recorded, thus leaving no type information for the arguments in the
function's entry in the function database, leading to the crash above.

Although naively unreachable functions are pruned when the list of
functions is first traversed in beam_ssa_type:opt_start/2, it doesn't
handle they case when type inference concludes that a function is
never called. This patch extends beam_ssa_type:opt_start/2 to, when
all functions have been processed, prune non-exported functions for
which #func_info.arg_types is just a list of empty dictionaries.  That
a function of zero arity is always considered reachable, is harmless
as it won't crash opt_continue/4.

Closes erlang#7509
Closes erlang#7478
@frej
Copy link
Contributor Author

frej commented Jul 26, 2023

I couldn't keep away from scratching this itch, #7512 fixes this crash and the crashes in #7478.

I took a quick look at the reproducers in #7342 and #6599 but for those, #7512 does nothing. So I don't think I agree that this is the same root case as the latter issues :)

frej added a commit to frej/otp that referenced this issue Jul 26, 2023
The beam_ssa_type:opt_continue/4 pass requires that type information
is available for all arguments or else it will crash with a back trace
similar to this:

Sub pass ssa_opt_type_continue
Function: '-f3/1-lc$^0/1-3-'/1
test362604.erl: internal error in pass beam_ssa_opt:
exception error: no function clause matching beam_types:join([])
  in function  beam_ssa_type:join_arg_types/3 (beam_ssa_type.erl, line 449)
  in call from beam_ssa_type:opt_continue/4 (beam_ssa_type.erl, line 432)
  in call from beam_ssa_opt:ssa_opt_type_continue/1 (beam_ssa_opt.erl, line 449)
  in call from compile:run_sub_passes_1/3 (compile.erl, line 424)
  in call from beam_ssa_opt:phase/4 (beam_ssa_opt.erl, line 116)
  in call from beam_ssa_opt:fixpoint/6 (beam_ssa_opt.erl, line 99)
  in call from beam_ssa_opt:run_phases/3 (beam_ssa_opt.erl, line 85)

If type analysis during beam_ssa_type:opt_start/2 concludes that a
function is never called, type information for the callee will never
be recorded, thus leaving no type information for the arguments in the
function's entry in the function database, leading to the crash above.

Although naively unreachable functions are pruned when the list of
functions is first traversed in beam_ssa_type:opt_start/2, it doesn't
handle they case when type inference concludes that a function is
never called. This patch extends beam_ssa_type:opt_start/2 to, when
all functions have been processed, prune non-exported functions for
which #func_info.arg_types is just a list of empty dictionaries.  That
a function of zero arity is always considered reachable, is harmless
as it won't crash opt_continue/4.

Closes erlang#7509
Closes erlang#7478
@jhogberg
Copy link
Contributor

jhogberg commented Jul 26, 2023

Like I said before, this kind of crash happens when another bug suddenly makes previously unreachable code reachable. That bug is #7342/#6599 in this case (failure to propagate type information between tests on different variables relating in some way to the same value).

#7512 fixes the crash by sweeping the bug under the rug, which I don't want to do in the general case because the code being unreachable at first could have been an error. It happens to be safe in this particular case, but we can't say that it will always be so.

@frej
Copy link
Contributor Author

frej commented Jul 26, 2023

#7512 fixes the crash by sweeping the bug under the rug, which I don't want to do in the general case because the code being unreachable at first could have been an error. It happens to be safe in this particular case, but we can't say that it will always be so.

My take is the exact opposite. The crashes in the reproducers of #7478 and #7512 are because an unreachable function is, on entry to beam_ssa_type:opt_start/2, considered reachable, is still considered reachable by beam_ssa_type:opt_continue/4 but lacks the meta data a reachable function should have in #func_info.arg_types in the function info database (as that is only updated when a call is encountered).

My analysis is that on entry to opt_start the function we'll crash in (call it f), is considered reachable . Running signatures/2 doesn't change that. opt_start_1 traverses all functions in the module and calls opt_function which will use the type knowledge we have to traverse the function's CFG. If we reach a local call, opt_local_call() will update the argument types for the callee in the function info database. In the reproducers, opt_local_call() is never called with f as a callee and hence arg_types for it will never be updated.

Unless we prune functions which become unreachable when we use type knowledge beam_ssa_type:opt_continue/4 will crash as soon as it calls beam_ssa_type:join_arg_types/3. From trace printouts I can see that at no point in time f has anything but a list of empty dictionaries in #func_info.arg_types, so this is clearly not a case of previously unreachable code becoming reachable.

The only bug I can see being swept under the rug with #7512, is if something is wrong with the type analysis so that it considers reachable code unreachable. But in that case, what stops us from crashing when it is right and a function which looks reachable turns out not to be, during beam_ssa_type:opt_start/2?

@jhogberg
Copy link
Contributor

You're looking at the wrong part of the code, the initial problem occurs much earlier during signatures/2 where a function that used to be reachable under a narrow argument type ("argument 2 only be 'false'") suddenly becomes unreachable with a wider argument type ("argument 2 can be 'false' | 'ok'") in the middle of the analysis. Note that at this stage the invariant is inverted: previously reachable code cannot be made unreachable.

This confuses the heck out of opt_start/2 which treats it as reachable during pruning and unreachable during its analysis (which adds the argument types used by opt_continue/4).

Pruning the functions once more as in #7512 hides the bug in signatures/2.

@frej
Copy link
Contributor Author

frej commented Jul 27, 2023

I see, that makes sense. I'll withdraw #7512.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug stalled waiting for input by the Erlang/OTP team team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

3 participants