sock.c:344 UCX ERROR recv(fd=47) failed: Connection reset by peer #8911
Unanswered
smallriver666
asked this question in
Q&A
Replies: 1 comment
-
can you pls add |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I encountered this problem when submitting tasks with multiple nodes. I have a management node and five computing nodes. Nodes 2 to 5 are the same server. Node 6 is not the same server as the other four nodes. I am on 2345 four Tasks can be successfully submitted on nodes, and the following error will occur when node 6 is added (I am using openmpi4.1.0)
![image](https://user-images.githubusercontent.com/104721358/222112043-d1e03ebf-1cd9-4d31-8108-86e5a235125f.png)
![image](https://user-images.githubusercontent.com/104721358/222112429-d9655355-375e-447e-a1da-ee02d5f0545f.png)
This is the script I used to submit the task:
Beta Was this translation helpful? Give feedback.
All reactions