You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I would like to link io-watchdog to my slurm installation, so hanging jobs can
be stopped.
Currently, I'm doing some tests with a LLNL resilience library called SCR. I've
set up four nodes and installed all the necessary software in order to run MPI
jobs using SLURM. One of this nodes acts as the SLURM controller and the rest
as compute nodes. This system is working properly at the moment.
OS: Debian 3.2.51-1 (Squeeze)
SLURM Version: 2.6.4
SLURM Spank Plugins version: 0.23
io-watchdog version: 0.8
To link io-watchdog with SLURM, I need to install and configure SPANK so it can
load dynamically the library when the user calls srun, as io-watchdog
documentation says. After a successfull installation, I added io-watchdog
dynamic library path to /etc/ld.so.conf.d/ and included the following lines in
plugstack.conf:
required /usr/local/lib/io-watchdog/io-watchdog.so
required /usr/local/lib/io-watchdog-interposer.so
(I checked that those paths are right)
However, the following info appears on the logs when starting slurm daemon:
# tail -n14 slurmd.log
[2013-11-22T09:21:55.719] debug: spank: opening plugin stack
/usr/local/etc/plugstack.conf
[2013-11-22T09:21:55.723] debug3: Couldn't find sym 'slurm_spank_init' in the
plugin
[2013-11-22T09:21:55.728] debug3: Couldn't find sym 'slurm_spank_slurmd_init'
in the plugin
[2013-11-22T09:21:55.732] debug3: Couldn't find sym 'slurm_spank_job_prolog' in
the plugin
[2013-11-22T09:21:55.736] debug3: Couldn't find sym 'slurm_spank_init_post_opt'
in the plugin
[2013-11-22T09:21:55.740] debug3: Couldn't find sym
'slurm_spank_local_user_init' in the plugin
[2013-11-22T09:21:55.744] debug3: Couldn't find sym 'slurm_spank_user_init' in
the plugin
[2013-11-22T09:21:55.750] debug3: Couldn't find sym
'slurm_spank_task_init_privileged' in the plugin
[2013-11-22T09:21:55.754] debug3: Couldn't find sym
'slurm_spank_task_post_fork' in the plugin
[2013-11-22T09:21:55.760] debug3: Couldn't find sym 'slurm_spank_task_exit' in
the plugin
[2013-11-22T09:21:55.764] debug3: Couldn't find sym 'slurm_spank_job_epilog' in
the plugin
[2013-11-22T09:21:55.769] debug3: Couldn't find sym 'slurm_spank_slurmd_exit'
in the plugin
[2013-11-22T09:21:55.773] debug3: Couldn't find sym 'slurm_spank_exit' in the
plugin
[2013-11-22T09:21:55.778] debug2: spank:
/usr/local/lib/io-watchdog/io-watchdog.so: no callbacks in this context
Note: Full log file is attached to this message.
Could version incompatibility be the reason why SPANK doesnt use io-watchdog
library?
Thanks in advance,
Jorge
Original issue reported on code.google.com by jbellonc...@gmail.com on 22 Nov 2013 at 9:33
Original issue reported on code.google.com by
jbellonc...@gmail.com
on 22 Nov 2013 at 9:33Attachments:
The text was updated successfully, but these errors were encountered: