You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If AM fails, TonyClient will hang for a while retrying to connect to AM. We should fail faster here.
14-09-2020 15:15:07 PDT mnist-avro-distributed INFO - 2020-09-14 22:15:07 INFO Client:962 - Retrying connect to server: ltx1-hcl3578.grid.linkedin.com/10.150.55.156:31852. Already tried 44 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
14-09-2020 15:15:08 PDT mnist-avro-distributed INFO - 2020-09-14 22:15:08 INFO Client:962 - Retrying connect to server: ltx1-hcl3578.grid.linkedin.com/10.150.55.156:31852. Already tried 45 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
14-09-2020 15:15:09 PDT mnist-avro-distributed INFO - 2020-09-14 22:15:09 INFO Client:962 - Retrying connect to server: ltx1-hcl3578.grid.linkedin.com/10.150.55.156:31852. Already tried 46 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
14-09-2020 15:15:10 PDT mnist-avro-distributed INFO - 2020-09-14 22:15:10 INFO Client:962 - Retrying connect to server: ltx1-hcl3578.grid.linkedin.com/10.150.55.156:31852. Already tried 47 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
14-09-2020 15:15:11 PDT mnist-avro-distributed INFO - 2020-09-14 22:15:11 INFO Client:962 - Retrying connect to server: ltx1-hcl3578.grid.linkedin.com/10.150.55.156:31852. Already tried 48 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - 2020-09-14 22:15:12 INFO Client:962 - Retrying connect to server: ltx1-hcl3578.grid.linkedin.com/10.150.55.156:31852. Already tried 49 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - 2020-09-14 22:15:12 FATAL TonyClient:985 - Failed to run TonyClient
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - java.net.ConnectException: Call From ltx1-hcl6554.grid.linkedin.com/10.150.121.188 to ltx1-hcl3578.grid.linkedin.com:31852 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:754)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1547)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.ipc.Client.call(Client.java:1489)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.ipc.Client.call(Client.java:1388)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at com.sun.proxy.$Proxy20.getTaskInfos(Unknown Source)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at com.linkedin.tony.rpc.impl.pb.client.TensorFlowClusterPBClientImpl.getTaskInfos(TensorFlowClusterPBClientImpl.java:75)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at java.lang.reflect.Method.invoke(Method.java:498)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at com.sun.proxy.$Proxy21.getTaskInfos(Unknown Source)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at com.linkedin.tony.rpc.impl.ApplicationRpcClient.getTaskInfos(ApplicationRpcClient.java:81)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at com.linkedin.tony.TonyClient.updateTaskInfos(TonyClient.java:895)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at com.linkedin.tony.TonyClient.monitorApplication(TonyClient.java:851)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at com.linkedin.tony.TonyClient.run(TonyClient.java:185)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at com.linkedin.tony.TonyClient.start(TonyClient.java:983)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at com.linkedin.tony.TonyClient.main(TonyClient.java:1097)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - Caused by: java.net.ConnectException: Connection refused
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:701)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:808)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:423)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.ipc.Client.getConnection(Client.java:1604)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - at org.apache.hadoop.ipc.Client.call(Client.java:1435)
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - ... 21 more
14-09-2020 15:15:12 PDT mnist-avro-distributed INFO - 2020-09-14 22:15:12 ERROR TonyClient:992 - Application failed to complete successfully
The text was updated successfully, but these errors were encountered:
If AM fails, TonyClient will hang for a while retrying to connect to AM. We should fail faster here.
The text was updated successfully, but these errors were encountered: