Skip to content
Mauro Bianco edited this page Jul 22, 2014 · 4 revisions

Table of Contents

MSW: Mauro's Slurm Wrappers

Introduction

I'm using some scripts to launch and monitor jobs on the machines at CSCS. I had different scripts for different machines, but now I'm in the process of unifying then into a single script for submitting and monitoring jobs. I have two kinds of scripts, one that take a single line you want to execute, another that takes a script to be embedded in a job file.

Example

   sbatch `create_job <nnodes> <ppernodes> <nhreads> <progname> [<arg1> [... <argn>]]`; monitorjob; cat `last`

Environment setup

To reach unification we need some environment variables to be set. For this reason I placed the following entries in .bashrc. Comments in the sample explain the meaning of the lines

  if `echo ${HOSTNAME} | grep castor 1>/dev/null 2>&1`; then
    export DEFAULT_PARTITION=express
    export LAUNCH_COMMAND="mpiexec.hydra -rmk slurm"
    export USE_GPU=fermi
    export CONSTRAINT_GPU=fermi
    export USE_GPU_NUM=2
    export MSW_VAR_USE_CUDA="MV2_USE_CUDA=1"
    export MSW_VAR_AFFIN="MV2_ENABLE_AFFINITY=0"
    module load cmake
    module load gcc/4.6.3
    module load mvapich2
    module load cuda
  fi

Since bash cannot export arrays, the following can be a little cumbersome. Anyway, additional environment variables for your job can be set by setting the variables

  MSW_VAR_1="MYVAR1=MYVAL1"
  MSW_VAR_2="MYVAR2=MYVAL3"
  MSW_VAR_3="MYVAR3=MYVAL3"
  etc.

But the following scripts will add and export these variables before the execution of your program.

Advanced Setup

Through environment variables you can also specify expressions that are evaluated at script invocation. These expressions can take environment variables and script internal variables and perform operations on them. An example is the following I am using on dom.

  if `echo ${HOSTNAME} | grep dom 1>/dev/null 2>&1`; then
    export DEFAULT_PARTITION=normal
    export LAUNCH_COMMAND="mpiexec.hydra -np \$[\$nodes*\$ppn] -ppn \$ppn"
    export CONSTRAINT_GPU=k20c
  fi

As you can see there, the script gives you access to $nodes and $ppn that you can use for customizing the calls. In this case this is needed since there is/was problem in slurm configuration that made -rmk slurm option noneffective. In this way we can circumvent the problem and get the desired behavior without specializing the script.

Launch script For Single Line Commands

Slurm provides a way to launch a script directly from a command line. I don't use that since the syntax can be tedious and my script let you to store your settings in environment variables or in the job maker itself.

A script create_job takes as arguments the number of nodes, number of processes per node and number of threads, executable path and a list of arguments to the executable, and produce a job file. The script prints the name of the job file created. The job file is formatted as

  job_<number of nodes>_<procs per node>_<threads_per_node>_<program_name>_<first_argument>

By running

  create_job 3 3 2 /builb/progname 12 4 5

The file job_3_3_2_progname_12 is created and the name printed as output of the script. A nice way to run it is to pas the output to sbatch

  sbatch `create_job 3 3 2 /builb/progname 12 4 5`
  sbatch `create_job 3 3 2 "cuda-memcheck --force-blocking /builb/progname 12 4 5`

Another way is to use create_job to generate a template job file to be then filled with other requirements, but if you get used to the MSW vars there is little use of that.

At the end a file

  slurm_<number of nodes>_<procs per node>_<threads_per_node>_<program_name>_<first_argument>

will be created with the output from the application.NOTE Remember that launching the create_job will overwrite a job file with the same name, so pay attention when you edit a job file to either rename it of not executing create_job with the same parameters.

Launch script for Scripts

A script create_job_script accept a script code instead of a simple executable. In the script passed to it you can override the variables defined in the template. In this way you can have a single script that is executed on different machines, or run multiple executable in a single job and post-process data. This script allows to do this. In your script you have some variables available, such as the actual launch command ($LAUNCH_COMMAND), the number of nodes, processes per node, and number of processes (depth for cray) per process in $nodes, $ppn, and $depth, respectively.

At the end a file

  slurm_<number of nodes>_<procs per node>_<threads_per_node>_<script_name>_<first_argument>

will be created with the output from the application.

Monitoring jobs

Now, a problem with batch systems is knowing when the job is done. for this reason I'm using a script named monitorjob. This script works fine if a single job by a user is running. Otherwise the results can be difficult to predict (I use it with multiple jobs and I find it useful anyway).

The typical command I use is

  sbatch `create_job 3 3 2 /builb/progname 12 4 5`; monitorjob

Which will indicate the status of the job. COMPLETED or FAILED will be the signal that the job is finished. If the job succeeded the script exits immediately.

Printing the output

A very simple script last prints the name of the last file written in the folder where the script is launched. Now, this script override the unix last command, but I bet the last time you ran last was not in the last decade.

  sbatch `create_job 3 3 2 /builb/progname 12 4 5`; monitorjob; cat `last`

If everything is fine the commands runs as if it was interactive.