Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib: create an init function for records with complex default values #9373

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

frazze-jobb
Copy link
Contributor

records that have field default values containing variables that are "free" was unsafe in functions that have variables with the same name. This commit creates init function for records to protect the variables in the default value.

e.g.
-record(r, {f = fun(X)->case X of {y, Y} -> Y; _ -> X end, g=..., h=abc}). foo(X)->#r{}. --> foo(X)->(r_init()){}.

r_init() will only initialize fields that will not be updated e.g.
foo(X)->#r{f=X} --> foo(X)->(r_init_f()){f=X}.
r_init_f will only initialize g and h with its default value, f will be initialized to undefined.

r_init() functions will not be generated if all fields of the record that contains "free variables" are initialized by the user.
e.g.
foo(X)->#r{f=X,g=X}. --> foo(X)->{r,X,X,abc}.

closes #9317

@frazze-jobb frazze-jobb self-assigned this Feb 3, 2025
Copy link
Contributor

github-actions bot commented Feb 3, 2025

CT Test Results

    2 files     97 suites   1h 9m 4s ⏱️
2 190 tests 2 140 ✅ 47 💤 3 ❌
2 556 runs  2 504 ✅ 49 💤 3 ❌

For more details on these failures, see this check.

Results for commit c22b3fa.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@frazze-jobb frazze-jobb added team:VM Assigned to OTP team VM fix labels Feb 3, 2025
@frazze-jobb frazze-jobb force-pushed the frazze/stdlib/erl_expand_records_create_init_function/OTP-19464 branch from d544c98 to e4b004f Compare February 3, 2025 11:31
St);

IsUndefined = [{RF, AnnoRF, Field, {atom, AnnoRF, 'undefined'}} || {record_field=RF, AnnoRF, Field, _} <- Is],
Fields = lists:flatten(lists:sort([atom_to_list(FieldAtom) || {record_field, _, {atom, _, FieldAtom}, _} <- Is])),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, prefer binaries over lists, and to sink expressions as far down as possible: the compiler has to maintain observability with regards to tracing, and cannot sink this expression down to the true branch below as that would result in a different trace.

@ilya-klyuchnikov
Copy link
Contributor

Now, as "free vars" are allowed in record fields, these free vars can affect each other.

-module(z).

-export([mk1/0]).

-record(a, {
  a = X = id(1),
  b = X = id(2)
}).

id(X) -> X.

mk1() ->
  #a{}.
1> z:mk1().
** exception error: no match of right hand side value 2
     in function  z:'rec_init$^0'/0 (z.erl, line 13)

@frazze-jobb
Copy link
Contributor Author

Now, as "free vars" are allowed in record fields, these free vars can affect each other.

Thanks, I'll get to it soon!

@frazze-jobb
Copy link
Contributor Author

Now, as "free vars" are allowed in record fields, these free vars can affect each other.

Thanks, I'll get to it soon!

Actually, I spoke to soon. This is as intended. Just like how it works if you try to update the record like this: #r0{a=X=id(1),b=X=id(2)} that would not work either.

But I will update the linter so that it warns for this.

@frazze-jobb frazze-jobb force-pushed the frazze/stdlib/erl_expand_records_create_init_function/OTP-19464 branch from 4ed5f0c to fc096c6 Compare February 4, 2025 13:11
@elbrujohalcon
Copy link
Contributor

elbrujohalcon commented Feb 4, 2025

Any chance of not including the new function in the error description?

So, from the other comment…

1> z:mk1().
** exception error: no match of right hand side value 2
     in function  z:'rec_init$^0'/0 (z.erl, line 13)

I would've preferred to see something like…

1> z:mk1().
** exception error: no match of right hand side value 2
     in function  z:mk1/0 (z.erl, line 13)

@frazze-jobb
Copy link
Contributor Author

Any chance of not including the new function in the error description?

Tail calls, like in this case, makes it invisible in the stacktrace.
Instead I suggest:
in record default value (z.erl, line 13)

Would that work for you?

@michalmuskala
Copy link
Contributor

Another option would be to explicitly prevent the compiler from tail calling into this helper function, and always emit a full call with building a stack frame

traverse_af(AF, Fun) ->
traverse_af(AF, Fun, []).
traverse_af(AF, Fun, Acc) when is_list(AF) ->
[ traverse_af(Ast, Fun, Fun(Ast,Acc)) || Ast <- AF];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[ traverse_af(Ast, Fun, Fun(Ast,Acc)) || Ast <- AF];
[traverse_af(Ast, Fun, Fun(Ast,Acc)) || Ast <- AF];

@ilya-klyuchnikov
Copy link
Contributor

With the current implementation it's possible to crash erlc:

-module(z).

-export([mk1/0]).

-record(a, {
  a = X = 1,
  b = X
}).

mk1() ->
  #a{a = 2}.
erlc z.erl

Function: 'rec_init$^0'/0
Sub pass ssa_opt_type_start
z.erl: internal error in pass beam_ssa_opt:
exception error: no match of right hand side value #{}
  in function  beam_ssa_type:concrete_type/2 (beam_ssa_type.erl, line 2522)
  in call from beam_ssa_type:simplify_arg/3 (beam_ssa_type.erl, line 1890)
  in call from beam_ssa_type:'-simplify_args/3-lc$^0/1-0-'/3 (beam_ssa_type.erl, line 1883)
  in call from beam_ssa_type:'-simplify_args/3-lc$^0/1-0-'/3 (beam_ssa_type.erl, line 1883)
  in call from beam_ssa_type:simplify/5 (beam_ssa_type.erl, line 1101)
  in call from beam_ssa_type:sig_is/7 (beam_ssa_type.erl, line 309)
  in call from beam_ssa_type:sig_bs/8 (beam_ssa_type.erl, line 252)
  in call from beam_ssa_type:sig_function_1/4 (beam_ssa_type.erl, line 221)

The code after expansion is:

-file("z.erl", 1).

-module(z).

-export([mk1/0]).

-record(a,{a = X = 1, b = X}).

mk1() ->
    begin
        REC0 = 'rec_init$^0'(),
        case REC0 of
            {a, _, _} ->
                setelement(2, REC0, 2);
            _ ->
                error({badrecord, REC0})
        end
    end.



'rec_init$^0'() ->
    {a, undefined, X}.

@bjorng
Copy link
Contributor

bjorng commented Feb 12, 2025

The test case erl_expand_records_SUITE:init/1 fails.

@frazze-jobb frazze-jobb force-pushed the frazze/stdlib/erl_expand_records_create_init_function/OTP-19464 branch from 5e9f9b8 to cf5cbab Compare February 13, 2025 15:49
@frazze-jobb
Copy link
Contributor Author

I think my attempt at being clever about this failed, and a naiver approach is better.
I am reverting back to that.

-module(z).

-export([mk1/0]).

-record(a, {
  a = X = 1,
  b = X,
  c = 3
}).

mk1() ->
  #a{a = 2}.

In the above case, you will apply the initialize values, (a=2 in this case) #a{2,X,3}, check if there are variables left in the expression.
If it is, you will run the rec_init() function and apply the default values first, followed by a record update. i.e. (rec_init())#a{a=2}.

mk1() ->
  #a{a = 2, b = 1}.

In this case after the initializing values are applied, you have #a{2,1,3}, there are no variables, so its not necessary to run the rec_init() function.

If you have side effects (which you shouldn't) in your records default value, you have to think a second time if that really is what you want. Since they might be running when you do not expect it.

Copy link
Contributor

@bjorng bjorng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I've looked at erl_error and erl_expand_record.

My comments are very nit-picky. We care very much about a consistent code-style in the compiler (and compiler-adjacent modules in STDLIB).

Comment on lines 549 to 552
no -> case is_rec_init(F) of
true -> <<"in record">>;
_ -> <<"in function ">>
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend not using _ to match a single known value. The _ can hide bugs if is_rec_init/1 is changed in the future to return more values. Also, if true is misspelled as ture, that bug is hidden too. Furthermore, the '_' can result in worse types in both Dialyzer and the compiler. Sometimes those worse types can result in worse code because the compiler is unable to see that some optimization is safe.

In this particular case, the compiler will "see" that if the value is not true, it must be false. In other words, the compiler will do the optimization for you.

Suggested change
no -> case is_rec_init(F) of
true -> <<"in record">>;
_ -> <<"in function ">>
end
no ->
case is_rec_init(F) of
true -> <<"in record">>;
false -> <<"in function ">>
end

Comment on lines 638 to 641
case is_rec_init(F) of
true -> <<"default value">>;
_ -> io_lib:fwrite(<<"~ts/~w">>, [mf_to_string({M, F}, A, Enc), A])
end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case is_rec_init(F) of
true -> <<"default value">>;
_ -> io_lib:fwrite(<<"~ts/~w">>, [mf_to_string({M, F}, A, Enc), A])
end.
case is_rec_init(F) of
true ->
<<"default value">>;
false ->
io_lib:fwrite(<<"~ts/~w">>,
[mf_to_string({M, F}, A, Enc), A])
end.

@@ -95,6 +97,12 @@ forms([{function,Anno,N,A,Cs0} | Fs0], St0) ->
forms([F | Fs0], St0) ->
{Fs,St} = forms(Fs0, St0),
{[F | Fs], St};
forms([], #exprec{new_forms=FsN}=St) ->
{[{'function', Anno,
maps:get(Def,FsN),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the existing code style in this module.

Suggested change
maps:get(Def,FsN),
maps:get(Def, FsN),

origin(1, M, F, A) ->
case is_op({M, F}, n_args(A)) of
{yes, F} -> <<"in operator ">>;
no -> <<"in function ">>
no -> case is_rec_init(F) of
true -> <<"in record">>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any test of this added code?

Comment on lines 273 to 307
traverse_af(AF, Fun) ->
traverse_af(AF, Fun, []).
traverse_af(AF, Fun, Acc) when is_list(AF) ->
[traverse_af(Ast, Fun, Fun(Ast,Acc)) || Ast <- AF];
traverse_af(AF, Fun, Acc) when is_tuple(AF) ->
%% Iterate each tuple element, if the element is an AF, traverse it
[[(fun (List) when is_list(List) ->
traverse_af(List, Fun, Acc);
(Tuple) when is_tuple(Tuple)->
case erl_anno:is_anno(Tuple) of
true -> [];
false -> traverse_af(Tuple, Fun, Fun(Tuple,Acc))
end;
(_) -> []
end)(Term) || Term <- tuple_to_list(AF)],Acc];
traverse_af(_, _, Acc) -> Acc.
save_vars({var, _, Var}, _) -> Var;
save_vars(_, Acc) -> Acc.
free_variables(AF, Acc) ->
try
_=traverse_af(AF, fun free_variables1/2, Acc),
false
catch
throw:{error,unsafe_variable} -> true
end.
free_variables1({'fun',_anno,{clauses, _}}, Acc) ->
{function,Acc}; %% tag that we are in a 'fun' now that can define new variables
free_variables1({clause,_anno,Pattern,_guards,_body}, {function,Acc}) ->
lists:flatten(traverse_af(Pattern, fun save_vars/2, [])++Acc);
free_variables1({var, _, Var}, Acc) ->
case lists:member(Var, Acc) of
true -> Acc;
false -> throw({error, unsafe_variable})
end;
free_variables1(_, Acc) -> Acc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
traverse_af(AF, Fun) ->
traverse_af(AF, Fun, []).
traverse_af(AF, Fun, Acc) when is_list(AF) ->
[traverse_af(Ast, Fun, Fun(Ast,Acc)) || Ast <- AF];
traverse_af(AF, Fun, Acc) when is_tuple(AF) ->
%% Iterate each tuple element, if the element is an AF, traverse it
[[(fun (List) when is_list(List) ->
traverse_af(List, Fun, Acc);
(Tuple) when is_tuple(Tuple)->
case erl_anno:is_anno(Tuple) of
true -> [];
false -> traverse_af(Tuple, Fun, Fun(Tuple,Acc))
end;
(_) -> []
end)(Term) || Term <- tuple_to_list(AF)],Acc];
traverse_af(_, _, Acc) -> Acc.
save_vars({var, _, Var}, _) -> Var;
save_vars(_, Acc) -> Acc.
free_variables(AF, Acc) ->
try
_=traverse_af(AF, fun free_variables1/2, Acc),
false
catch
throw:{error,unsafe_variable} -> true
end.
free_variables1({'fun',_anno,{clauses, _}}, Acc) ->
{function,Acc}; %% tag that we are in a 'fun' now that can define new variables
free_variables1({clause,_anno,Pattern,_guards,_body}, {function,Acc}) ->
lists:flatten(traverse_af(Pattern, fun save_vars/2, [])++Acc);
free_variables1({var, _, Var}, Acc) ->
case lists:member(Var, Acc) of
true -> Acc;
false -> throw({error, unsafe_variable})
end;
free_variables1(_, Acc) -> Acc.
variables({var,_,'_'}) ->
[];
variables({var,_,V}) ->
[V];
variables({'fun',_,Def}) ->
%% The Def tuple has no annotation. Must handle it specially.
case Def of
{clauses,Cs} -> variables(Cs);
{function,F,A} -> variables([F,A]);
{function,M,F,A} -> variables([M,F,A])
end;
variables(Tuple) when is_tuple(Tuple) ->
[Tag,Anno|T] = tuple_to_list(Tuple),
true = is_atom(Tag), %Assertion.
true = erl_anno:is_anno(Anno), %Assertion.
variables(T);
variables(List) when is_list(List) ->
foldl(fun(E, Vs0) ->
Vs1 = variables(E),
ordsets:union(Vs0, Vs1)
end, [], List);
variables(_) ->
[].

[traverse_af(Ast, Fun, Fun(Ast,Acc)) || Ast <- AF];
traverse_af(AF, Fun, Acc) when is_tuple(AF) ->
%% Iterate each tuple element, if the element is an AF, traverse it
[[(fun (List) when is_list(List) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tip: there is no need create and call a fun here. You can just use a case or an if here.

throw:{error,unsafe_variable} -> true
end.
free_variables1({'fun',_anno,{clauses, _}}, Acc) ->
{function,Acc}; %% tag that we are in a 'fun' now that can define new variables
Copy link
Contributor

@bjorng bjorng Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. Only variables defined in the function head will be shadowed. Variables defined in the function body will be matched against variables having the same name in the enclosing function.

In my suggested simplified function, I don't try to handle funs; I think they are used too infrequently to be worth the effort.

Comment on lines 382 to 406
%% Initialize the record
R_init = [{atom,Anno,Name} |
record_inits(record_fields(Name, Anno0, St), Is)],
Vars = lists:flatten(traverse_af(Is, fun save_vars/2)),
%% Check if there are variables in the initialized record
%% if there are, we need to initialize the record using a generated function
case free_variables(R_init, Vars) of
true ->
%% Initialize the record with only the default values
R_default_init = [{atom,Anno,Name} |
record_inits(record_fields(Name, Anno, St),[])],
{Def,St1} = expr({tuple,Anno,R_default_init},St),
Map=St1#exprec.new_forms,
{FName,St2} = case maps:get(Def, Map, undefined) of
undefined->
C=St1#exprec.rec_init_count,
NewName=list_to_atom("rec_init$^" ++ integer_to_list(C)),
{NewName, St1#exprec{rec_init_count=C+1, new_forms=Map#{Def=>NewName}}};
OldName -> {OldName,St1}
end,
%% replace the init record expression with a call expression
%% to the newly added function and a record update.
expr({record, Anno0, {call,Anno,{atom,Anno,FName},[]}, Name, Is},St2);
false ->
%% No free variables means that we can just
%% output the record as a tuple.
expr({tuple,Anno,R_init},St)
end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here follows my suggested change to use the variables/1 function suggested by me earlier. I've also broken some long lines, renamed variables to follow the conventions we have in compiler modules, and a few other minor improvements.

For the main change below to work, line 180 must be changed as follows:

expr({record,Anno0,Name,Is}, St0) ->

Here follows the main suggestion:

Suggested change
%% Initialize the record
R_init = [{atom,Anno,Name} |
record_inits(record_fields(Name, Anno0, St), Is)],
Vars = lists:flatten(traverse_af(Is, fun save_vars/2)),
%% Check if there are variables in the initialized record
%% if there are, we need to initialize the record using a generated function
case free_variables(R_init, Vars) of
true ->
%% Initialize the record with only the default values
R_default_init = [{atom,Anno,Name} |
record_inits(record_fields(Name, Anno, St),[])],
{Def,St1} = expr({tuple,Anno,R_default_init},St),
Map=St1#exprec.new_forms,
{FName,St2} = case maps:get(Def, Map, undefined) of
undefined->
C=St1#exprec.rec_init_count,
NewName=list_to_atom("rec_init$^" ++ integer_to_list(C)),
{NewName, St1#exprec{rec_init_count=C+1, new_forms=Map#{Def=>NewName}}};
OldName -> {OldName,St1}
end,
%% replace the init record expression with a call expression
%% to the newly added function and a record update.
expr({record, Anno0, {call,Anno,{atom,Anno,FName},[]}, Name, Is},St2);
false ->
%% No free variables means that we can just
%% output the record as a tuple.
expr({tuple,Anno,R_init},St)
end;
RInit = [{atom,Anno,Name} |
record_inits(record_fields(Name, Anno0, St0), Is)],
Vars = variables(Is),
%% Check if there are variables in the initialized record. If
%% there are, we need to initialize the record using a generated
%% function
AnyVariables = not ordsets:is_subset(variables(RInit), Vars),
case AnyVariables of
true ->
%% Initialize the record with only the default values.
RDefInit = [{atom,Anno,Name} |
record_inits(record_fields(Name, Anno, St0),[])],
{Def,St1} = expr({tuple,Anno,RDefInit}, St0),
Map0 = St1#exprec.new_forms,
{FName,St2} =
case Map0 of
#{Def := OldName} ->
{OldName,St1};
#{} ->
C = St1#exprec.rec_init_count,
NewName = list_to_atom("rec_init$^" ++
integer_to_list(C)),
Map = Map0#{Def => NewName},
{NewName,St1#exprec{rec_init_count=C+1,
new_forms=Map}}
end,
%% Replace the init record expression with a call expression
%% to the newly added function followed by a record update.
expr({record, Anno0, {call,Anno,{atom,Anno,FName},[]}, Name, Is},St2);
false ->
%% No free variables means that we can just output the
%% record as a tuple.
expr({tuple,Anno,RInit}, St0)
end;

@ilya-klyuchnikov
Copy link
Contributor

"Bound" variables from previous fields are not propagated correctly to next fields.

This definition is accepted:

-record(r1, {
  a = X = 1,
  c = X
}).

While this definition is rejected:

-record(r1, {
  a = X = 1,
  b,
  c = X
}).
rec.erl:10:7: variable 'X' is unbound
%   10|   c = X

@michalmuskala
Copy link
Contributor

Thinking about it a bit more, I actually find it very surprising we want to allow variables "leaking" across fields in default values - we don't allow that in regular record creation!

For example this is an error:

-record(foo, {a, b}).

foo() -> #foo{a = X = 1, b = X}.

Why should moving this into the default expression somehow make this compile and OK? This sounds quite inconsistent to me.

@ilya-klyuchnikov
Copy link
Contributor

I actually find it very surprising we want to allow variables "leaking" across fields in default values

I agree, - this introduces one more "rule" about scoping of variables that is already quite complicated and sometimes surprising.

@frazze-jobb
Copy link
Contributor Author

frazze-jobb commented Feb 14, 2025

Thinking about it a bit more, I actually find it very surprising we want to allow variables "leaking" across fields in default values - we don't allow that in regular record creation!

For example this is an error:

-record(foo, {a, b}).

foo() -> #foo{a = X = 1, b = X}.

Why should moving this into the default expression somehow make this compile and OK? This sounds quite inconsistent to me.

Okey, I must have been confused because I thought this was valid! I'll fix the behavior next week then.

create an init function e.g. rec_init$^0, for each record
with definitions containing variables.

e.g.
-record(r, {f = fun(X)->case X of {y, Y} -> Y; _ -> X end, g=..., h=abc}).
foo(X)->\#r{}. --> foo(X)->(rec_init()){}.

rec_init() will initialize all fields with the default values

If one field is set and the omitted field default value has variables, then
a new init function is created that only initializes the omitted fields.

- removes lint error for variables in definitions
- updates erl_lint_SUITE and erl_expand_records_SUITE to work with this new behavior
- adds handling of records that are calling functions to the shell
  - records calling local non exported functions will fail initialization
@frazze-jobb frazze-jobb force-pushed the frazze/stdlib/erl_expand_records_create_init_function/OTP-19464 branch from cf5cbab to f8dfae2 Compare February 17, 2025 00:16
[NE2]=reconstruct1(NE, [],0, []),
ets:insert(FT, [begin {value, Fun, []} = erl_eval:expr({'fun', A, {clauses, F}}, []),
{{function, {shell_default, FunName, 0}}, Fun}
end || {function,_,FunName,0,F}=_F1<-Forms, FunName=/=foo]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
end || {function,_,FunName,0,F}=_F1<-Forms, FunName=/=foo]),
end || {function,_,FunName,0,F}<-Forms, FunName=/=foo]),

Comment on lines 740 to 741
{ok, [{atom, _, Fun1}], _} ->
shell(Fun1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ok, [{atom, _, Fun1}], _} ->
shell(Fun1)
{ok, [{atom, _, Fun1}], _} -> shell(Fun1)

Comment on lines +766 to +767
{ok, [{atom, _, Fun1}], _} ->
bif(Fun1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ok, [{atom, _, Fun1}], _} ->
bif(Fun1)
{ok, [{atom, _, Fun1}], _} -> bif(Fun1)

true -> "shell_default";
_ -> bif(Fun)
end
shell_default_or_bif(Fun1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For keep one code style, it make sense to move this API call, to line 753.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Variables in default field values of records
7 participants