Replies: 1 comment 4 replies
-
Looks like upgrading to This might not be a perfect test case of #176 because each target only saves a few KB of data, but maybe a next test could be a version that saves entire |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
To better diagnose #176, I revisited the
targets
pipeline at https://github.com/openpharma/brms.mmrm/tree/main/vignettes/sbc, a simulation study to validate a Bayesian model using simulation-based calibration (SBC) checking.I turned on local memory monitoring using the
log_resources
argument of the controller. Memory usage of the localtargets
process andmirai
dispatcher process looks fine:And even after running for 2 days, the dispatcher and pipeline are still running. So that's at least good news.
However, I am seeing a lot of timeouts when the workers are first trying to dial in, and the worker logs look strange. The
targets
pipeline tries to keep 50 workers going at a time, and if if I list the number of lines of each worker instance log file, I see:The logs with ~25000 lines are successfully running Stan models:
But the ones with few lines are having trouble dialing in:
I do need to implement #178, but I find it hard to believe this would be a memory issue. Maybe it's because I'm running
nanonext_1.1.1.9015
andmirai_1.1.1.9012
? I will upgrade and try again. Hopefully the resources on the cluster will still be available.Session info:
FYI @shikokuchuo, @multimeric
Beta Was this translation helpful? Give feedback.
All reactions