From dfa0082f3a6cfe8c7ec272dc4ab2300fea808764 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 13:25:14 -0700 Subject: [PATCH 01/19] Added general troubleshooting page --- docs/general/troubleshooting.md | 23 +++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 24 insertions(+) create mode 100644 docs/general/troubleshooting.md diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md new file mode 100644 index 00000000..854ff135 --- /dev/null +++ b/docs/general/troubleshooting.md @@ -0,0 +1,23 @@ +## Common SSH Issues + +Here are some of the most common issues users face when using SSH. + + +### Keys + +The following clusters use SSH keys: Atomate, Farm, Franklin, HPC1, HPC2, Impact, Peloton. + +If you connect to one of these and are asked for a password (as distinct from a passphrase for your key), +your key is not being recognized. This is usually because of permissions or an unexpected filename. +SSH expects your key to be one of a specific set of names. Unless you have specified something other than +the default, this is probably going to be **.ssh/id_rsa**. + +If you specified a different name when generating your key, you can specify it like this: + +```bash +ssh -i .ssh/newkey [USER]@[cluster].hpc.ucdavis.edu +``` + +If you kept the default value, your permissions should be set so that only you can read and write the key (-rw------- or 600). + +If you are trying to use a key to access LSSC0, this will not work. diff --git a/mkdocs.yml b/mkdocs.yml index f7b2d40a..bd29a42f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -20,6 +20,7 @@ nav: - R and RStudio: software/rlang.md #- Developing Software: software/developing.md - Data Transfer: data-transfer.md + - Troubleshooting: general/troubleshooting.md - Clusters: - Farm: - farm/index.md From b4c6c80ba022318054721ee4c1ba607e582cc74d Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 14:10:40 -0700 Subject: [PATCH 02/19] Made suggested changes --- docs/general/troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 854ff135..5002d58a 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -10,7 +10,7 @@ The following clusters use SSH keys: Atomate, Farm, Franklin, HPC1, HPC2, Impact If you connect to one of these and are asked for a password (as distinct from a passphrase for your key), your key is not being recognized. This is usually because of permissions or an unexpected filename. SSH expects your key to be one of a specific set of names. Unless you have specified something other than -the default, this is probably going to be **.ssh/id_rsa**. +the default, this is probably going to be **$HOME/.ssh/id_rsa**. If you specified a different name when generating your key, you can specify it like this: From 08df3f49e9a4847138d21209ae68f4de6b08f22a Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 14:16:58 -0700 Subject: [PATCH 03/19] Made suggested changes again --- docs/general/troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 5002d58a..326053cc 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -15,7 +15,7 @@ the default, this is probably going to be **$HOME/.ssh/id_rsa**. If you specified a different name when generating your key, you can specify it like this: ```bash -ssh -i .ssh/newkey [USER]@[cluster].hpc.ucdavis.edu +ssh -i $HOME/.ssh/newkey [USER]@[cluster].hpc.ucdavis.edu ``` If you kept the default value, your permissions should be set so that only you can read and write the key (-rw------- or 600). From f8167341a04d1f2e0397cec4ff20279fe484497b Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 14:19:13 -0700 Subject: [PATCH 04/19] Made suggested changes again --- docs/general/troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 326053cc..30f75ca8 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -10,7 +10,7 @@ The following clusters use SSH keys: Atomate, Farm, Franklin, HPC1, HPC2, Impact If you connect to one of these and are asked for a password (as distinct from a passphrase for your key), your key is not being recognized. This is usually because of permissions or an unexpected filename. SSH expects your key to be one of a specific set of names. Unless you have specified something other than -the default, this is probably going to be **$HOME/.ssh/id_rsa**. +the default, this is probably going to be `$HOME/.ssh/id_rsa`. If you specified a different name when generating your key, you can specify it like this: From 68af5e3770aa8679f3bb7fa9d08bdfd7d495899a Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 14:20:16 -0700 Subject: [PATCH 05/19] Made suggested changes once more --- docs/general/troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 30f75ca8..6e5fa69f 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -18,6 +18,6 @@ If you specified a different name when generating your key, you can specify it l ssh -i $HOME/.ssh/newkey [USER]@[cluster].hpc.ucdavis.edu ``` -If you kept the default value, your permissions should be set so that only you can read and write the key (-rw------- or 600). +If you kept the default value, your permissions should be set so that only you can read and write the key `(-rw------- or 600)`. If you are trying to use a key to access LSSC0, this will not work. From 66bd9ac20d5ddb30a4c015fb6b35142a6abdf422 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 15:24:22 -0700 Subject: [PATCH 06/19] Added more details --- docs/general/troubleshooting.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 6e5fa69f..c37dc02e 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -19,5 +19,14 @@ ssh -i $HOME/.ssh/newkey [USER]@[cluster].hpc.ucdavis.edu ``` If you kept the default value, your permissions should be set so that only you can read and write the key `(-rw------- or 600)`. +To ensure this is the case, you can do the following: -If you are trying to use a key to access LSSC0, this will not work. +```bash +chown 600 $HOME/.ssh/id_rsa +``` + +On HPC2, your public key is kept in `$HOME/.ssh/authorized_keys`. Please make sure to not remove your key from this file. +Doing so will cause you will lose access. + +If you are trying to use a key to access LSSC0 or any of the Genome Center login nodes, SSH keys will not work. It is possible +to use `kinit` locally and `GSSAPI` to avoid entering a password on every login. From 646d2fcfe8b304747deadaf4f9c447d4383d7664 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 15:48:06 -0700 Subject: [PATCH 07/19] Added Kerberos stuff --- docs/general/troubleshooting.md | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index c37dc02e..229dda91 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -28,5 +28,23 @@ chown 600 $HOME/.ssh/id_rsa On HPC2, your public key is kept in `$HOME/.ssh/authorized_keys`. Please make sure to not remove your key from this file. Doing so will cause you will lose access. -If you are trying to use a key to access LSSC0 or any of the Genome Center login nodes, SSH keys will not work. It is possible -to use `kinit` locally and `GSSAPI` to avoid entering a password on every login. +If you are trying to use a key to access LSSC0 or any of the Genome Center login nodes, SSH keys will not work, but there is +another method. + +To enable logins without a password, you will need to enable GSSAPI, which +some systems enable by default. If not enabled, add the following to your +`.ssh/config` file (create it if it doesn't exist): + + GSSAPIAuthentication yes + GSSAPIDelegateCredentials yes + +The `-K` command line switch to ssh does the same thing on a one-time +basis. + +Once you have `GSSAPI` enabled, you can get a Kerberos ticket using + +```bash +kinit [USER]@GENOMECENTER.UCDAVIS.EDU +``` + +SSH will use that ticket while it's valid. From d1dd9b482168175e5f31d28490742a2a6b929561 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Tue, 25 Jun 2024 08:33:53 -0700 Subject: [PATCH 08/19] Consistency tweak --- docs/general/troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 229dda91..92ea90b7 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -33,7 +33,7 @@ another method. To enable logins without a password, you will need to enable GSSAPI, which some systems enable by default. If not enabled, add the following to your -`.ssh/config` file (create it if it doesn't exist): +`$HOME/.ssh/config` file (create it if it doesn't exist): GSSAPIAuthentication yes GSSAPIDelegateCredentials yes From bc2dc32e847cf57b72553669b2bddc186fa2f819 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Tue, 25 Jun 2024 11:52:24 -0700 Subject: [PATCH 09/19] Added Slurm troubleshooting --- docs/general/troubleshooting.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 92ea90b7..288cde87 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -2,7 +2,6 @@ Here are some of the most common issues users face when using SSH. - ### Keys The following clusters use SSH keys: Atomate, Farm, Franklin, HPC1, HPC2, Impact, Peloton. @@ -48,3 +47,19 @@ kinit [USER]@GENOMECENTER.UCDAVIS.EDU ``` SSH will use that ticket while it's valid. + +## Common Slurm Scheduler Issues + +These are the most common issues with job scheduling using Slurm. + +### Using a non-default account + +If you have access to more than one Slurm account and wish to use an account other than your default, +use the `-A` or `--account` flag. + +e.g. If your default account is in `foogrp` and you wish to use `bargrp`: +```bash +srun -A bargrp -t 1:00:00 --mem=20GB scriptname.sh +``` + +This also works if you don't have a default account. From a1390f3c83f9cedc462d6aaee917c00020ed4f31 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 13:25:14 -0700 Subject: [PATCH 10/19] Added general troubleshooting page --- docs/general/troubleshooting.md | 23 +++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 24 insertions(+) create mode 100644 docs/general/troubleshooting.md diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md new file mode 100644 index 00000000..854ff135 --- /dev/null +++ b/docs/general/troubleshooting.md @@ -0,0 +1,23 @@ +## Common SSH Issues + +Here are some of the most common issues users face when using SSH. + + +### Keys + +The following clusters use SSH keys: Atomate, Farm, Franklin, HPC1, HPC2, Impact, Peloton. + +If you connect to one of these and are asked for a password (as distinct from a passphrase for your key), +your key is not being recognized. This is usually because of permissions or an unexpected filename. +SSH expects your key to be one of a specific set of names. Unless you have specified something other than +the default, this is probably going to be **.ssh/id_rsa**. + +If you specified a different name when generating your key, you can specify it like this: + +```bash +ssh -i .ssh/newkey [USER]@[cluster].hpc.ucdavis.edu +``` + +If you kept the default value, your permissions should be set so that only you can read and write the key (-rw------- or 600). + +If you are trying to use a key to access LSSC0, this will not work. diff --git a/mkdocs.yml b/mkdocs.yml index b7a5d20c..e3df68b5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -19,6 +19,7 @@ nav: - R and RStudio: software/rlang.md - Development: software/developing.md - Data Transfer: data-transfer.md + - Troubleshooting: general/troubleshooting.md - Clusters: - Farm: - About: farm/index.md From 0f7ac6ef2c09b3f8ca18eb2d94d1954065a3433d Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 14:10:40 -0700 Subject: [PATCH 11/19] Made suggested changes --- docs/general/troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 854ff135..5002d58a 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -10,7 +10,7 @@ The following clusters use SSH keys: Atomate, Farm, Franklin, HPC1, HPC2, Impact If you connect to one of these and are asked for a password (as distinct from a passphrase for your key), your key is not being recognized. This is usually because of permissions or an unexpected filename. SSH expects your key to be one of a specific set of names. Unless you have specified something other than -the default, this is probably going to be **.ssh/id_rsa**. +the default, this is probably going to be **$HOME/.ssh/id_rsa**. If you specified a different name when generating your key, you can specify it like this: From e7c7a54e63c041fb2a2dfdb3ec7c84d89c709028 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 14:16:58 -0700 Subject: [PATCH 12/19] Made suggested changes again --- docs/general/troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 5002d58a..326053cc 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -15,7 +15,7 @@ the default, this is probably going to be **$HOME/.ssh/id_rsa**. If you specified a different name when generating your key, you can specify it like this: ```bash -ssh -i .ssh/newkey [USER]@[cluster].hpc.ucdavis.edu +ssh -i $HOME/.ssh/newkey [USER]@[cluster].hpc.ucdavis.edu ``` If you kept the default value, your permissions should be set so that only you can read and write the key (-rw------- or 600). From bae3e44e3facdf1624524bfcd224b3338524f6f3 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 14:19:13 -0700 Subject: [PATCH 13/19] Made suggested changes again --- docs/general/troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 326053cc..30f75ca8 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -10,7 +10,7 @@ The following clusters use SSH keys: Atomate, Farm, Franklin, HPC1, HPC2, Impact If you connect to one of these and are asked for a password (as distinct from a passphrase for your key), your key is not being recognized. This is usually because of permissions or an unexpected filename. SSH expects your key to be one of a specific set of names. Unless you have specified something other than -the default, this is probably going to be **$HOME/.ssh/id_rsa**. +the default, this is probably going to be `$HOME/.ssh/id_rsa`. If you specified a different name when generating your key, you can specify it like this: From 3ae02a735b03281a35f838bc91cfedc8ba012a68 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 14:20:16 -0700 Subject: [PATCH 14/19] Made suggested changes once more --- docs/general/troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 30f75ca8..6e5fa69f 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -18,6 +18,6 @@ If you specified a different name when generating your key, you can specify it l ssh -i $HOME/.ssh/newkey [USER]@[cluster].hpc.ucdavis.edu ``` -If you kept the default value, your permissions should be set so that only you can read and write the key (-rw------- or 600). +If you kept the default value, your permissions should be set so that only you can read and write the key `(-rw------- or 600)`. If you are trying to use a key to access LSSC0, this will not work. From 8c82f4b7bc431bbeb74daacb5e951b04b09904db Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 15:24:22 -0700 Subject: [PATCH 15/19] Added more details --- docs/general/troubleshooting.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 6e5fa69f..c37dc02e 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -19,5 +19,14 @@ ssh -i $HOME/.ssh/newkey [USER]@[cluster].hpc.ucdavis.edu ``` If you kept the default value, your permissions should be set so that only you can read and write the key `(-rw------- or 600)`. +To ensure this is the case, you can do the following: -If you are trying to use a key to access LSSC0, this will not work. +```bash +chown 600 $HOME/.ssh/id_rsa +``` + +On HPC2, your public key is kept in `$HOME/.ssh/authorized_keys`. Please make sure to not remove your key from this file. +Doing so will cause you will lose access. + +If you are trying to use a key to access LSSC0 or any of the Genome Center login nodes, SSH keys will not work. It is possible +to use `kinit` locally and `GSSAPI` to avoid entering a password on every login. From 2e664abc72a62ead0c6fbeb9bab6ba479961d15b Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Mon, 24 Jun 2024 15:48:06 -0700 Subject: [PATCH 16/19] Added Kerberos stuff --- docs/general/troubleshooting.md | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index c37dc02e..229dda91 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -28,5 +28,23 @@ chown 600 $HOME/.ssh/id_rsa On HPC2, your public key is kept in `$HOME/.ssh/authorized_keys`. Please make sure to not remove your key from this file. Doing so will cause you will lose access. -If you are trying to use a key to access LSSC0 or any of the Genome Center login nodes, SSH keys will not work. It is possible -to use `kinit` locally and `GSSAPI` to avoid entering a password on every login. +If you are trying to use a key to access LSSC0 or any of the Genome Center login nodes, SSH keys will not work, but there is +another method. + +To enable logins without a password, you will need to enable GSSAPI, which +some systems enable by default. If not enabled, add the following to your +`.ssh/config` file (create it if it doesn't exist): + + GSSAPIAuthentication yes + GSSAPIDelegateCredentials yes + +The `-K` command line switch to ssh does the same thing on a one-time +basis. + +Once you have `GSSAPI` enabled, you can get a Kerberos ticket using + +```bash +kinit [USER]@GENOMECENTER.UCDAVIS.EDU +``` + +SSH will use that ticket while it's valid. From 8cf242b60749574a696f3db9651a3ab03a6df192 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Tue, 25 Jun 2024 08:33:53 -0700 Subject: [PATCH 17/19] Consistency tweak --- docs/general/troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 229dda91..92ea90b7 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -33,7 +33,7 @@ another method. To enable logins without a password, you will need to enable GSSAPI, which some systems enable by default. If not enabled, add the following to your -`.ssh/config` file (create it if it doesn't exist): +`$HOME/.ssh/config` file (create it if it doesn't exist): GSSAPIAuthentication yes GSSAPIDelegateCredentials yes From 9fc6d10e2abd837532cda8c29949c06b8fb428b6 Mon Sep 17 00:00:00 2001 From: John McDonnell Date: Tue, 25 Jun 2024 11:52:24 -0700 Subject: [PATCH 18/19] Added Slurm troubleshooting --- docs/general/troubleshooting.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 92ea90b7..288cde87 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -2,7 +2,6 @@ Here are some of the most common issues users face when using SSH. - ### Keys The following clusters use SSH keys: Atomate, Farm, Franklin, HPC1, HPC2, Impact, Peloton. @@ -48,3 +47,19 @@ kinit [USER]@GENOMECENTER.UCDAVIS.EDU ``` SSH will use that ticket while it's valid. + +## Common Slurm Scheduler Issues + +These are the most common issues with job scheduling using Slurm. + +### Using a non-default account + +If you have access to more than one Slurm account and wish to use an account other than your default, +use the `-A` or `--account` flag. + +e.g. If your default account is in `foogrp` and you wish to use `bargrp`: +```bash +srun -A bargrp -t 1:00:00 --mem=20GB scriptname.sh +``` + +This also works if you don't have a default account. From a45ab3cde7a5519c7eaca5a61b8c27c3f9ac5005 Mon Sep 17 00:00:00 2001 From: Camille Scott Date: Wed, 26 Jun 2024 15:56:11 -0700 Subject: [PATCH 19/19] Add information about Partitions and Accounts to Scheduler::Resources and link there from troubleshooting --- docs/general/troubleshooting.md | 12 +++++- docs/scheduler/resources.md | 74 +++++++++++++++++++++++++++++++-- 2 files changed, 81 insertions(+), 5 deletions(-) diff --git a/docs/general/troubleshooting.md b/docs/general/troubleshooting.md index 288cde87..cfc2ef4b 100644 --- a/docs/general/troubleshooting.md +++ b/docs/general/troubleshooting.md @@ -62,4 +62,14 @@ e.g. If your default account is in `foogrp` and you wish to use `bargrp`: srun -A bargrp -t 1:00:00 --mem=20GB scriptname.sh ``` -This also works if you don't have a default account. +### No default account + +Newer slurm accounts have no default specified, and in this case you might get error message like: + +``` +sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified +``` + +You will need to specify the account explicitly as explained [above](#no-default-account). +You can find out how to view your Slurm account information in the [resources +section](../scheduler/resources.md). diff --git a/docs/scheduler/resources.md b/docs/scheduler/resources.md index 4b1ec2ef..6b3e58b8 100644 --- a/docs/scheduler/resources.md +++ b/docs/scheduler/resources.md @@ -1,5 +1,71 @@ # Requesting Resources + +## Partitions + +Each **node** -- physically distinct machines within the cluster -- will be a member of one or more +**partitions**. A partition consists of a collection of nodes, a policy for job scheduling on that +partition, a policy for conflicts when nodes are a member of more than one partition (ie. +preemption), and a policy for managing and restricting resources per user or per group referred to +as Quality of Service. +The Slurm documentation has detailed information on how [preemption](https://slurm.schedmd.com/preempt.html) and [QOS +definitions](https://slurm.schedmd.com/qos.html) are handled; our per-cluster _Resources_ sections +describe how partitions are organized and preemption handled on our clusters. + +## Accounts + +Users are granted access to resources via Slurm **associations**. An association links together a +**user** with an **account** and a QOS definition. **Accounts** most commonly correspond to your +lab, but sometimes exist for graduate groups, departments, or institutes. + +To see your associations, and thus which accounts and partitions you have access to, you can use the +`sacctmgr` command: + +``` console +$ sacctmgr show assoc user=$USER + Cluster Account User Partition Share ... MaxTRESMins QOS Def QOS GrpTRESRunMin +---------- ---------- ---------- ---------- --------- ... ------------- -------------------- --------- ------------- + franklin hpccfgrp camw mmgdept-g+ 1 ... hpccfgrp-mmgdept-gp+ + franklin hpccfgrp camw mmaldogrp+ 1 ... hpccfgrp-mmaldogrp-+ + franklin hpccfgrp camw cashjngrp+ 1 ... hpccfgrp-cashjngrp-+ + franklin hpccfgrp camw jalettsgr+ 1 ... hpccfgrp-jalettsgrp+ + franklin hpccfgrp camw jawdatgrp+ 1 ... hpccfgrp-jawdatgrp-+ + franklin hpccfgrp camw low 1 ... hpccfgrp-low-qos + franklin hpccfgrp camw high 1 ... hpccfgrp-high-qos + franklin jawdatgrp camw low 1 ... mcbdept-low-qos + franklin jawdatgrp camw jawdatgrp+ 1 ... jawdatgrp-jawdatgrp+ + franklin jalettsgrp camw jalettsgr+ 1 ... jalettsgrp-jalettsg+ + franklin jalettsgrp camw low 1 ... mcbdept-low-qos +``` + +The output is very wide, so you may want to pipe it through `less` to make it more readable: + +``` console +sacctmgr show assoc user=$USER | less -S +``` + +Or, perhaps preferably, output it in a more compact format: + +``` console +$ sacctmgr show assoc user=camw format="account%20,partition%20,qos%40" + Account Partition QOS +-------------------- -------------------- ---------------------------------------- + hpccfgrp mmgdept-gpu hpccfgrp-mmgdept-gpu-qos + hpccfgrp mmaldogrp-gpu hpccfgrp-mmaldogrp-gpu-qos + hpccfgrp cashjngrp-gpu hpccfgrp-cashjngrp-gpu-qos + hpccfgrp jalettsgrp-gpu hpccfgrp-jalettsgrp-gpu-qos + hpccfgrp jawdatgrp-gpu hpccfgrp-jawdatgrp-gpu-qos + hpccfgrp low hpccfgrp-low-qos + hpccfgrp high hpccfgrp-high-qos + jawdatgrp low mcbdept-low-qos + jawdatgrp jawdatgrp-gpu jawdatgrp-jawdatgrp-gpu-qos + jalettsgrp jalettsgrp-gpu jalettsgrp-jalettsgrp-gpu-qos + jalettsgrp low mcbdept-low-qos +``` + +In the above example, we can see that user `camw` has access to the `high` partition via an +association with `hpccfgrp` and the `jalettsgrp-gpu` partition via the `jalettsgrp` account. + ## Resource Types ### CPUs / cores @@ -14,8 +80,8 @@ Slurm's CPU management methods are complex and can quickly become confusing. For the purposes of this documentation, we will provide a simplified explanation; those with advanced needs should consult [the Slurm documentation](https://slurm.schedmd.com/cpu_management.html). -Slurm follows a distinction between its physically resources -- cluster nodes and CPUs or cores on a node -- and virtual -resources, or **tasks**, which specificy how requested physical resources will be grouped and distributed. +Slurm follows a distinction between its physical resources -- cluster nodes and CPUs or cores on a node -- and virtual +resources, or **tasks**, which specify how requested physical resources will be grouped and distributed. By default, Slurm will minimize the number of nodes allocated to a job, and attempt to keep the job's CPU requests localized within a node. **Tasks** group together CPUs (or other resources): CPUs within a task will be kept together on the same node. @@ -116,7 +182,7 @@ In our prior examples, however, we used small resource requests. What happens when we want to distribute jobs across nodes? Slurm uses the [block distribution](https://slurm.schedmd.com/sbatch.html#OPT_block) method by default to distribute -tasks betwee nodes. +tasks between nodes. It will exhaust all the CPUs on a node with task groups before moving to a new node. For these examples, we're going to create a script that reports both the hostname (ie, the node) and the number of CPUs: @@ -210,4 +276,4 @@ srun: launch/slurm: _step_signal: Terminating StepId=706.0 ``` -### GPUs / GRES \ No newline at end of file +### GPUs / GRES