Crew worker terminating itself for unknown reasons #176
Replies: 3 comments 33 replies
-
Interesting. Come to think of it, I remember seeing this in one of my own large pipelines as well a while back. That trace is a helpful clue, and it seems to point to #141 and shikokuchuo/mirai#87 (comment). To solve #141, each worker terminates itself with Line 466 in e62928e @shikokuchuo, is it possible that the |
Beta Was this translation helpful? Give feedback.
-
FYI I implemented both local and worker-level memory logging via the new |
Beta Was this translation helpful? Give feedback.
-
@multimeric, is it possible that this original issue was an instance of #189? |
Beta Was this translation helpful? Give feedback.
-
I'm using
crew
+crew.cluster
viatargets
. In my pipeline, I have a specific target that consistently crashes the worker, although I can't work out why. I would like to resolve this issue so that thetargets
pipeline can finish. It doesn't seem to be memory related, as I've set anrlimit
/ulimit
which is never exceeded. For this reason, I also don't think that Slurm, my job scheduler, is killing the job.In
targets
, I just get the normal "the worker has died" message:In the Slurm stdout/stderr log, the only info I get is:
I ran
strace
on the relevant R process (PID 28977), and I can see that it is actually terminating itself (si_pid=28977
). Why mightcrew
(or some other part of the stack, likemirai
,nanonext
etc) be doing this?Here are the last lines from the strace:
Additional diagnostics here:
Beta Was this translation helpful? Give feedback.
All reactions