You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to have some means of monitoring the resources used by the dispatched jobs.
For example, I use following script to dispatch UPPAAL-specific (single-threaded with lots of memory) jobs:
#!/usr/bin/env bash#SBATCH --nodes 1#SBATCH --ntasks 1#SBATCH --sockets-per-node 1#SBATCH --cores-per-socket 1#SBATCH --mail-type=END#SBATCH --mail-user=......#SBATCH --error=slurm-%j.err#SBATCH --output=slurm-%j.log# # S B A T C H --time=24:00:00# # S B A T C H --partition=romeset -e
PERIOD=10
MEMAVAIL=100000
whilegetopts"hm:p:" option ;docase$optionin
h)
echo"$0 script launches a command and monitors the resources."echo"$0 exists if process exits."echo"$0 kills the process if machine rans out of available memory."echo"Synopsis: $0 [-h] [-p N] [-m N] command with arguments"echo" -h prints this help screen"echo" -p N samples every N seconds"echo" -m N kills if available memory gets below N kB"exit;;
m) MEMAVAIL="$OPTARG";;
p) PERIOD="$OPTARG";;
?) echo"Invalid option $option, consult -h"exit 2;;
esacdoneshift$(($OPTIND-1))
COMMAND="$@"if [ "$#"-lt 1 ];thenecho -e "Error: no arguments, expecting command and its arguments."echo -e "Usage:\n\t$0 your_command your_arguments"exit 1
fi"$@"&
pid=$!# echo "Process statistics is in process-$pid-stats.txt"exec hogwatch -p$PERIOD -m$MEMAVAIL$pid
whereas the hogwatch is the script monitoring specific process and logging the resources:
#!/usr/bin/env bashset -e
PERIOD=10
MEMAVAIL=100000
whilegetopts"hm:p:" option ;docase$optionin
h)
echo"$0 script monitors processes by their PIDs and statistics into process-PID-stats.txt."echo"$0 exists if all monitored processes exit or machine rans out of available memory."echo"Synopsis: $0 [-h] [-p N] [-m N] PID*"echo" -h prints this help screen"echo" -p N samples every N seconds"echo" -m N kills watched PIDs if available memory gets below N kB"exit;;
m) MEMAVAIL="$OPTARG";;
p) PERIOD="$OPTARG";;
?) echo"Invalid option $option, consult -h"exit 2;;
esacdoneshift$(($OPTIND-1))
PIDS="$@"functionproc_status() {
pid=$1
f=process-$pid-stats.txt
if [ !-e$f ];then# print the whole command line as the first line:
ps -o pid,args -p $pid| tail -n1 >$f# print the table header:echo -ne "DATE ">>$f
ps -o pcpu,pmem,cputime,etime,vsize,rss -p $pid| head -n1 >>$ffi# print date-timestamp:echo -ne "$(date +%s)">>$f# print process resources:
ps -o pcpu,pmem,cputime,etime,vsize,rss -p $pid| tail -n1 >>$f
}
# monitor the free memory:
mem_avail=$(free | grep Mem | gawk '{ print $7 }')while [ $mem_avail-gt$MEMAVAIL ] ;do
list=""forpidin$PIDS;doif [ -e"/proc/$pid" ];then
proc_status $pid
list="$list$pid"fidone
PIDS=$listif [ -z"$PIDS" ];thenexit 0
fi
sleep $PERIOD
mem_free=$(free | grep Mem | gawk '{ print $4 }')doneecho"hogwatch: machine is out of available memory, thus killing $PIDS"kill -9 $PIDS
Then I have the following python script to show the memory and cpu consumption:
Currently Python cannot open a window with interactive zoom widgets, because windowing toolkit libraries (such as tk, qt, gtk etc) are not installed (and are not available in virtual python environments), so the script dumps a png image and then launches display to show it.
Example result:
The text was updated successfully, but these errors were encountered:
It would be nice to have some means of monitoring the resources used by the dispatched jobs.
For example, I use following script to dispatch UPPAAL-specific (single-threaded with lots of memory) jobs:
whereas the
hogwatch
is the script monitoring specific process and logging the resources:Then I have the following python script to show the memory and cpu consumption:
Currently Python cannot open a window with interactive zoom widgets, because windowing toolkit libraries (such as
tk
,qt
,gtk
etc) are not installed (and are not available in virtual python environments), so the script dumps a png image and then launchesdisplay
to show it.Example result:
The text was updated successfully, but these errors were encountered: