You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Race condition is possible when a process that called ssh:connect() has died due to some reason before calling ssh_connection_handler:takeover(). In such case the next operation on socket (either inet:peername() or inet:sockname()) will result in {error, einval}
The error report is confusing and sometimes it is hard to tell the race condition from real problems with the socket.
To Reproduce
It is hard to reproduce without mingling with process scheduling in Erlang VM or patching the ssh_connection_handler. In our tests the race has appeared when multiple worker processes were created, each starting own SSH connection as a client, and then some workers could be terminated by the parent calling exit(Pid, shutdown) based on timeouts/load/etc. Some of the terminated workers reported {error, einval}.
Expected behavior
Exit without "einval"
Affected versions
OTP-25, OTP-26
Additional context
The following patch for OTP-25 fixed the race in our tests:
diff --git a/lib/ssh/src/ssh_connection_handler.erl b/lib/ssh/src/ssh_connection_handler.erl
index 4ef45516ca..a2a368af9a 100644
--- a/lib/ssh/src/ssh_connection_handler.erl
+++ b/lib/ssh/src/ssh_connection_handler.erl
@@ -401,6 +401,11 @@ alg(ConnectionHandler) ->
%%====================================================================
init([Role, Socket, Opts]) when Role==client ; Role==server ->
+ %% ssh_params will be changed in post_init() to values derived from Opts
+ D = #data{socket = Socket, ssh_params = #ssh{opts = Opts}},
+ {ok, {post_init, Role}, D}.
+
+post_init(Role, #data{socket = Socket, ssh_params = #ssh{opts = Opts}}) ->
case inet:peername(Socket) of
{ok, PeerAddr} ->
try
@@ -414,7 +419,8 @@ init([Role, Socket, Opts]) when Role==client ; Role==server ->
connection_state = init_connection_record(Role, Socket, Opts)
},
process_flag(trap_exit, true),
- {ok, {hello,Role}, D}
+ NextEvent = {next_event, internal, socket_controlled},
+ {next_state, {hello,Role}, D, NextEvent}
catch
_:{error,Error} -> {stop, {error,Error}};
error:Error -> {stop, {error,Error}}
@@ -584,7 +590,11 @@ callback_mode() ->
%%% ######## {hello, client|server} ####
%% The very first event that is sent when the we are set as controlling process of Socket
-handle_event(cast, socket_control, {hello,_}=StateName, #data{ssh_params = Ssh0} = D) ->
+handle_event(cast, socket_control, {post_init, Role}, DIn) ->
+ post_init(Role, DIn);
+
+handle_event(internal, socket_controlled, {hello, _Role} = StateName, D) ->
+ Ssh0 = D#data.ssh_params,
VsnMsg = ssh_transport:hello_version_msg(string_version(Ssh0)),
send_bytes(VsnMsg, D),
case inet:getopts(Socket=D#data.socket, [recbuf]) of
@@ -1364,6 +1374,11 @@ handle_event(Type, Ev, StateName, D0) ->
%% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
+terminate(_, {post_init, _}, _) ->
+ %% No need to to anything - maybe we have not yet gotten
+ %% control over the socket
+ ok;
+
terminate(normal, _StateName, D) ->
close_transport(D);
The text was updated successfully, but these errors were encountered:
Describe the bug
Race condition is possible when a process that called ssh:connect() has died due to some reason before calling ssh_connection_handler:takeover(). In such case the next operation on socket (either inet:peername() or inet:sockname()) will result in {error, einval}
The error report is confusing and sometimes it is hard to tell the race condition from real problems with the socket.
To Reproduce
It is hard to reproduce without mingling with process scheduling in Erlang VM or patching the ssh_connection_handler. In our tests the race has appeared when multiple worker processes were created, each starting own SSH connection as a client, and then some workers could be terminated by the parent calling exit(Pid, shutdown) based on timeouts/load/etc. Some of the terminated workers reported {error, einval}.
Expected behavior
Exit without "einval"
Affected versions
OTP-25, OTP-26
Additional context
The following patch for OTP-25 fixed the race in our tests:
The text was updated successfully, but these errors were encountered: