Helpful slurm commands

Slurm is the workload manager used for many of our compute clusters. Details of all slurm commands can be looked up online, but if you don't know what command you are looking for it can be difficult to figure out how to perform some tasks, or even to know that certain tasks are possible.

Basic day-to-day slurm operations (submitting jobs, canceling jobs, etc).

The documentation supplied by each compute cluster is pretty good here (if not, please add to this wiki!).

To cancel all jobs of a user, do scancel -u $USER
To start a job, do sbatch Submit.sh. If you'd like to know when it's done or when it fails, do sbatch --mail-user=$YOUR_EMAIL --mail-type=END --mail-type=FAIL Submit.sh
If you forget what a long running job was, do scontrol show jobid $JOB_ID, where $JOB_ID is the jobid of the job of interest

Querying user limits on queues (which are called 'partitions' by slurm)

This is something that you won't use every day, but can be important when debugging slurm problems.

To see basic information for all partitions:

sinfo -s

Each partition has associated with it a "quality of service" (QOS), which is a struct that specifies various user limits on that partition (i.e. the amount of run time, number of nodes, number of cores per node, amount of memory, etc that a user can request). To see the name of the QOS associated with each partition (plus more information associated with that partition), use the following command (look for QoS=<name> in the output; <name> is the name of the QOS):

scontrol show partition

If you want to find the limits associated with every QOS, listed by the name of each QOS, the command is:

sacctmgr show qos

For more details, these commands can be looked up here

Helpful slurm commands

Basic day-to-day slurm operations (submitting jobs, canceling jobs, etc).

Querying user limits on queues (which are called 'partitions' by slurm)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally