diff --git a/README.md b/README.md index 611c54c..bcf28b9 100644 --- a/README.md +++ b/README.md @@ -8,12 +8,13 @@ Sbox is a toolbox for Slurm that provides information about users' accounts and - Facilitate request resources interactively. - Easy ability to start a JupyerLab session. - JupyterLab interface with multiple kernels. -- JupyterLab interface with access to virtual environments for Python libraries such as TensorFlow and PyTorch . -- Easy to set up and configure. It cab be installed in the user level or cluster-wide +- JupyterLab interface with access to premade virtual environments such as TensorFlow and PyTorch. +- JupyterLab interface with access to a local virtual environments. +- Easy to set up and configure. It can be installed in the user level or cluster-wide. - Explanatory help options (`--help`) and reference manuals (`man sbox, man interactive`). - Improving `seff` command by using `top` command for showing the running jobs efficiency. - Managing users ssh-agent to be able to communicate with clients outside (e.g. GitHub) or within the cluster (other nodes) without asking for the passphrase. -- Helping users by showing their fairshares, accounts, quotas, jobs' efficiencies and history, running and pending jobs, as well as the cluster resources. +- Helping users by showing their fairshares, accounts, quotas, jobs' history, running and pending jobs, as well as cluster resources. ## Commands @@ -59,16 +60,16 @@ Jobs histoty: ```bash [user@lewis4-r630-login-node675 ~]$ sbox --hist day --------------------------------------------------------------------------------- Jobs History - Last Day -------------------------------------------------------------------------------- - JobID User Account State Partition QOS NCPU NNod ReqMem Submit Reserved Start Elapsed End NodeList JobName ----------- ------ ------- ---------- --------- ------- ---- ---- ------ ------------------- ---------- ------------------- ---------- ------------------- -------------------- ---------- - 23126125 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:05 00:00:00 2021-07-28T01:25:05 00:00:03 2021-07-28T01:25:08 lewis4-c8k-hpc2-nod+ bash - 23126126 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:13 00:00:00 2021-07-28T01:25:13 00:00:03 2021-07-28T01:25:16 lewis4-c8k-hpc2-nod+ bash - 23126127 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:20 00:00:00 2021-07-28T01:25:20 00:00:08 2021-07-28T01:25:28 lewis4-c8k-hpc2-nod+ bash - 23126128 user genera+ COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:49 00:00:00 2021-07-28T01:25:49 00:00:03 2021-07-28T01:25:52 lewis4-c8k-hpc2-nod+ bash - 23126129 user genera+ COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:26:05 00:00:00 2021-07-28T01:26:05 00:00:06 2021-07-28T01:26:11 lewis4-c8k-hpc2-nod+ bash - 23126130 user genera+ COMPLETED Gpu normal 1 1 2Gn 2021-07-28T01:26:38 00:00:02 2021-07-28T01:26:40 00:00:11 2021-07-28T01:26:51 lewis4-z10pg-gpu3-n+ bash - 23126131 user genera+ CANCELLED+ Gpu normal 1 1 2Gn 2021-07-28T01:27:43 00:00:01 2021-07-28T01:27:44 00:01:03 2021-07-28T01:28:47 lewis4-z10pg-gpu3-n+ jupyter-py +-------------------------------------------------------------------------------- Jobs History - Last Day --------------------------------------------------------------------- + JobID User Account State Partition QOS NCPU NNod ReqMem Submit Reserved Start Elapsed End NodeList +---------- ------ ------- ---------- --------- ------- ---- ---- ------ ------------------- ---------- ------------------- ---------- ------------------- -------------------- + 23126125 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:05 00:00:00 2021-07-28T01:25:05 00:00:03 2021-07-28T01:25:08 lewis4-c8k-hpc2-nod+ + 23126126 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:13 00:00:00 2021-07-28T01:25:13 00:00:03 2021-07-28T01:25:16 lewis4-c8k-hpc2-nod+ + 23126127 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:20 00:00:00 2021-07-28T01:25:20 00:00:08 2021-07-28T01:25:28 lewis4-c8k-hpc2-nod+ + 23126128 user genera+ COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:49 00:00:00 2021-07-28T01:25:49 00:00:03 2021-07-28T01:25:52 lewis4-c8k-hpc2-nod+ + 23126129 user genera+ COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:26:05 00:00:00 2021-07-28T01:26:05 00:00:06 2021-07-28T01:26:11 lewis4-c8k-hpc2-nod+ + 23126130 user genera+ COMPLETED Gpu normal 1 1 2Gn 2021-07-28T01:26:38 00:00:02 2021-07-28T01:26:40 00:00:11 2021-07-28T01:26:51 lewis4-z10pg-gpu3-n+ + 23126131 user genera+ CANCELLED+ Gpu normal 1 1 2Gn 2021-07-28T01:27:43 00:00:01 2021-07-28T01:27:44 00:01:03 2021-07-28T01:28:47 lewis4-z10pg-gpu3-n+ ``` Jobs efficiency for running and compeleted jobs: @@ -76,7 +77,7 @@ Jobs efficiency for running and compeleted jobs: ```bash [user@lewis4-r630-login-node675 ~]$ sbox --eff 23227816 ------------------------------------- Job Efficiency ------------------------------------- - PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND + PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 47262 user 20 0 115700 3888 1600 S 0.0 0.0 0:00.03 bash 47346 user 20 0 113292 149298 1256 S 99.0 23.0 0:13.30 python @@ -172,21 +173,21 @@ Partition Gpu has 383 cpus available out of 412 (93%) ### Command line options - `-h, --help`: Show this help message and exit. -- `-A, --account`: Slurm account name or project ID. +- `-a, --account`: Slurm account name or project ID. - `-n, --ntasks`: Number of tasks (cpus). - `-N, --nodes`: Number of nodes. - `-p, --partition`: Partition name. - `-t, --time`: Number of hours based on the partitions timelimit. - `-l, --license`: Add a license to an interactive session. -- `-m, --mem`: Amount of memory per GB. +- `-m, --mem`: Amount of memory (per GB). - `-g, --gpu`: Number of gpus. - `-k, --kernel`: Jupyter kernel for python, r, julia. The default kernel is python. - `-e, --environment`: Virtual environment(s) for a JupyterLab session. -- `-y , --myenv`: Path to a local virtual environment. The local virtual envs should contain JupyterLab. +- `-E, --myenv`: Path to a local virtual environment. The local virtual envs should contain JupyterLab. **Examples** -Use the cluster interactively: +Using the cluster interactively: ```bash [user@lewis4-r630-login-node675 ~]$ interactive @@ -194,7 +195,7 @@ Logging into Interactive partition with 2G memory, 1 cpu for 2 hours ... [user@lewis4-r7425-htc5-node835 ~]$ ``` -Use the cluster interactively with more time and resources: +Using the cluster interactively with more time and resources: ```bash [user@lewis4-r630-login-node675 ~]$ interactive --mem 16 -n 6 -t 4 @@ -202,7 +203,7 @@ Logging into Interactive partition with 16G memory, 6 cpu for 4 hours ... [user@lewis4-r7425-htc5-node835 ~]$ ``` -Use the cluster interactively with a license: +Using the cluster interactively with a license: ```bash [user@lewis4-r630-login-node675 ~]$ interactive --mem 16 -n 6 -t 4 -l matlab @@ -210,7 +211,7 @@ Logging into Interactive partition with 16G memory, 6 cpu for 4 hours with a mat [user@lewis4-r7425-htc5-node835 ~]$ ``` -Use a Gpu interactively: +Using a Gpu interactively: ```bash [user@lewis4-r630-login-node675 ~]$ interactive -p Gpu @@ -218,7 +219,7 @@ Logging into Gpu partition with 1 gpu, 2G memory, 1 cpu for 2 hours ... [user@lewis4-r730-gpu3-node431 ~]$ ``` -Use JupyterLab: +Using JupyterLab: ```bash [user@lewis4-r630-login-node675 ~]$ interactive jupyter Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... @@ -238,38 +239,66 @@ To stop the server run the following on the cluster: scancel 23150533 ``` -Use TensorFlow with JupyterLab: +Using JupyterLab with R kernel: ```bash -[user@lewis4-r630-login-node675 ~]$ interactive jupyter -A general-gpu -p gpu3 --mem 16 -t 8 -e tensorflow -Logging into gpu3 partition with 1 gpu, 16G memory, 1 cpu for 8 hours with account general-gpu ... +[user@lewis4-r630-login-node675 ~]$ interactive jupyter -k r +Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... Starting Jupyter server (it might take about a couple minutes) ... Starting Jupyter server ... Starting Jupyter server ... ... ``` -Use R with JupyterLab: +Using TensorFlow on JupyterLab by a different account and on a partition with 16 GB memory for 8 hours: ```bash -interactive jupyter -k r -Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... +[user@lewis4-r630-login-node675 ~]$ interactive jupyter -a general-gpu -p gpu3 --mem 16 -t 8 -e tensorflow +Logging into gpu3 partition with 1 gpu, 16G memory, 1 cpu for 8 hours with account general-gpu ... Starting Jupyter server (it might take about a couple minutes) ... Starting Jupyter server ... Starting Jupyter server ... ... ``` +**Note**: Users can install other packages and mix local packages with the premade environments. For example, for Python: + +```bash +pip install --target +export PYTHONPATH=:$PYTHONPATH +``` + +For R, run the following in R: + +```R +dir.create("") +install.packages("", repos = "http://cran.us.r-project.org", lib = "") +.libPaths("") +``` + +Using a local virtual environment: +```bash +[user@lewis4-r630-login-node675 ~]$ interactive jupyter -E +Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... +Starting Jupyter server (it might take about a couple minutes) ... +Starting Jupyter server ... +``` +**Note**: The local environments must include `jupyterlab`. For R environments, they must also contain `r-irkernel`. For instance: + +```bash +conda create -p -c conda-forge r-base jupyterlab r-irkernel +``` + ## Quick install - Download and extract the [latest Sbox release](https://github.com/ashki23/sbox/releases/latest). +- To access JupyerLab sessions, install Anaconda and create required virtual environments and modulefiles. Review "Requirements" to learn more. - Update the `config` file based on the cluster information. Review "Configuration" to learn more. -- To access a JupyerLab session, install Anaconda and create the required virtual environments and modulefiles. Review "Requirements" to learn more. -- Place a modulefile for Sbox under `$MODULEPATH/sbox` directory and load the module or add the Sbox bin directory to `$PATH`. A Sbox template modulefile can be found under `./templates/1.2.lua`. +- Place a modulefile for Sbox under `$MODULEPATH/sbox` directory and load the module or add the Sbox bin directory to `$PATH`. A Sbox template modulefile can be found in [here](https://github.com/ashki23/sbox/blob/main/templates/1.2.lua). ## Requirements Sbox requires Slurm and Python >= 3.6.8. The `interactive jupyter` command requires Anaconda and an environment module system (e.g. [Lmod](https://lmod.readthedocs.io/en/latest/)) in addition to Slurm and Python. To use R and Julia in JupyterLab sessions, we need R and irkernel as well as Julia to be installed. -Note that Sbox options require some other commands. Review the options requirement under the command line options. +Note that Sbox options require some other commands. Review their requirements under the command line options. The following shows how to install Anaconda and create the required virtual envs and modulefiles. @@ -351,7 +380,7 @@ cd //anaconda/ ./bin/conda create -n r-essentials- -c conda-forge r-essentials r-base r-irkernel jupyterlab ``` -In the above lines, `` and `` should be updated based on the Anaconda path and `` (e.g. `4.0.3`) based on the version of R in the env. +In the above lines, `` and `` should be updated based on the Anaconda path and version, and `` (e.g. `4.0.3`) based on the version of R in the env. The following modulefile should be added to `$MODULEPATH/r-essentials/.lua` to be able to load the R env: @@ -390,7 +419,7 @@ cd //anaconda/ ./bin/conda create -n julia- -c conda-forge julia ``` -In the above lines, `` and `` should be updated based on the Anaconda path and `` (e.g. `1.6.1`) based on the version of Julia in the env. +In the above lines, `` and `` should be updated based on the Anaconda path and version, and `` (e.g. `1.6.1`) based on the version of Julia in the env. The following modulefile should be added to `$MODULEPATH/julia/.lua`: @@ -440,13 +469,13 @@ cd //anaconda/ ./bin/conda install -n pytorch- -c pytorch pytorch gpustat ``` -For instance, we collect set of popular R bio packages in the following env from bioconda channel: +For instance, we can collect popular R bio packages in the following env from bioconda channel: ```bash cd //anaconda/ ./bin/conda create -n r-bioessentials- -c bioconda -c conda-forge bioconductor-edger bioconductor-oligo r-monocle3 r-signac r-seurat scanpy macs2 jupyterlab r-irkernel ``` -In the above lines, `` and `` should be updated based on the Anaconda path and `` (e.g. `2.4.1`) based on the version of TF, PT, and R. +In the above lines, `` and `` should be updated based on the Anaconda path and version, and `` (e.g. `2.4.1`) based on the version of TF, PT, or R. For each env, we need to add a modulefile to `$MODULEPATH//.lua`. For instance `$MODULEPATH/tensorflow/.lua` is: @@ -475,6 +504,8 @@ setenv("ANACONDA_ROOT", this_root) Or adding a tcl modulefile similar to the above tcl template for Anaconda. +**Note**: Users can add other packages and mix a local stack of packages with the premade environments. For Python and R packages users can apply `pip install` and `install.packages` respectively to install packages on their home. In order to install packages in a differnt path than home, we can specify the desired path and add the new path to the library path of the software. See examples under the `interactive` command line options examples. + ## Configuration The `sbox` and `interactive` commands are reading the required information from the below JSON config file. @@ -512,23 +543,23 @@ For example: "cpu_partition": ["Interactive","Lewis","Serial","Dtn","hpc3","hpc4","hpc4rc","hpc5","hpc6","General","Gpu"], "gpu_partition": ["Gpu","gpu3","gpu4"], "interactive_partition_timelimit": { - "Interactive": 4, - "Dtn": 4, - "Gpu": 2 + "Interactive": 4, + "Dtn": 4, + "Gpu": 2 }, "jupyter_partition_timelimit": { - "Lewis": 8, - "hpc4": 8, - "hpc5": 8, - "hpc6": 8, - "gpu3": 8, - "gpu4": 8, - "Gpu": 2 + "Lewis": 8, + "hpc4": 8, + "hpc5": 8, + "hpc6": 8, + "gpu3": 8, + "gpu4": 8, + "Gpu": 2 }, "partition_qos": { - "Interactive": "interactive", - "Serial": "seriallong", - "Dtn": "dtn" + "Interactive": "interactive", + "Serial": "seriallong", + "Dtn": "dtn" }, "kernel_module": { "python": "anaconda", @@ -536,10 +567,10 @@ For example: "julia": "julia" }, "env_module": { - "tensorflow-v1.9": "tensorflow/1.9.0", - "tensorflow": "tensorflow", - "pytorch": "pytorch", - "r-bio": "r-bioessentials" + "tensorflow-v1.9": "tensorflow/1.9.0", + "tensorflow": "tensorflow", + "pytorch": "pytorch", + "r-bio": "r-bioessentials" } } ``` diff --git a/docs/_templates/info.html b/docs/_templates/info.html index 5f15a54..9b314a0 100644 --- a/docs/_templates/info.html +++ b/docs/_templates/info.html @@ -1,5 +1,5 @@

-Sbox is a small toolbox for Slurm that provides information about users' accounts and jobs as well as information about the cluster resources. +Sbox is a simple toolbox for Slurm that provides information about users' accounts and jobs as well as information about the cluster resources. https://github.com/ashki23/sbox

diff --git a/docs/index.rst b/docs/index.rst index 9b4b300..20795e7 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -12,12 +12,13 @@ Features * Facilitate request resources interactively. * Easy ability to start a JupyerLab session. * JupyterLab interface with multiple kernels. -* JupyterLab interface with access to virtual environments for Python libraries such as TensorFlow and PyTorch . -* Easy to set up and configure. It cab be installed in the user level or cluster-wide +* JupyterLab interface with access to premade virtual environments such as TensorFlow and PyTorch. +* JupyterLab interface with access to a local virtual environments. +* Easy to set up and configure. It can be installed in the user level or cluster-wide. * Explanatory help options (``--help``) and reference manuals (``man sbox, man interactive``). * Improving ``seff`` command by using ``top`` command for showing the running jobs efficiency. * Managing users ssh-agent to be able to communicate with clients outside (e.g. GitHub) or within the cluster (other nodes) without asking for the passphrase. -* Helping users by showing their fairshares, accounts, quotas, jobs' efficiencies and history, running and pending jobs, as well as the cluster resources. +* Helping users by showing their fairshares, accounts, quotas, jobs' history, running and pending jobs, as well as cluster resources. Install ------- diff --git a/docs/requirements.rst b/docs/requirements.rst index 5f4f44c..6ddb5c8 100644 --- a/docs/requirements.rst +++ b/docs/requirements.rst @@ -1,29 +1,35 @@ Quick install -============= +------------- -- Download and extract the `latest Sbox release `__. +- Download and extract the `latest Sbox + release `__. +- To access JupyerLab sessions, install Anaconda and create required + virtual environments and modulefiles. Review “Requirements” to learn + more. - Update the ``config`` file based on the cluster information. Review - `Configuration `__ to learn more. -- To access a JupyerLab session, install Anaconda and create the required virtual environments and modulefiles. Review - `Requirements `__ to learn more. -- Place a modulefile for Sbox under ``$MODULEPATH/sbox`` and load the module or add the Sbox bin directory to ``$PATH``. A Sbox template modulefile can be found in `here `__. + “Configuration” to learn more. +- Place a modulefile for Sbox under ``$MODULEPATH/sbox`` directory and + load the module or add the Sbox bin directory to ``$PATH``. A Sbox + template modulefile can be found in + `here `__. Requirements -============ +------------ Sbox requires Slurm and Python >= 3.6.8. The ``interactive jupyter`` command requires Anaconda and an environment module system (e.g. `Lmod `__) in addition to -Slurm and Python. To use R and Julia in JupyterLab sessions, we need R and irkernel as well as Julia to be installed. +Slurm and Python. To use R and Julia in JupyterLab sessions, we need R +and irkernel as well as Julia to be installed. -Note that Sbox options require some other commands. Review -the options requirement under the `command line options `__. +Note that Sbox options require some other commands. Review their +requirements under the command line options. The following shows how to install Anaconda and create the required virtual envs and modulefiles. Python kernel (Anaconda) ------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~ The ``interactive jupyter`` command provides a JupyterLab interface for running Python and many scientific packages by using Anaconda. To @@ -68,7 +74,7 @@ modeulefile under ``$MODULEPATH/anaconda/.lua``: Or adding the following tcl modulefile under ``$MODULEPATH/anaconda/``: -.. code:: +.. code:: tcl #%Module1.0 ## Metadata ########################################### @@ -100,7 +106,7 @@ Or adding the following tcl modulefile under setenv ${this_module_upper}_ROOT ${this_root} R kernel --------- +~~~~~~~~ Users can run R scripts within a JupterLab notebook by ``interactive jupyter -k r``. To have R, irkernel and many other R @@ -114,8 +120,8 @@ from Anaconda: ./bin/conda create -n r-essentials- -c conda-forge r-essentials r-base r-irkernel jupyterlab In the above lines, ```` and ```` -should be updated based on the Anaconda path and ```` -(e.g. ``4.0.3``) based on the version of R in the env. +should be updated based on the Anaconda path and version, and +```` (e.g. ``4.0.3``) based on the version of R in the env. The following modulefile should be added to ``$MODULEPATH/r-essentials/.lua`` to be able to load the R @@ -142,20 +148,21 @@ env: prepend_path("C_INCLUDE_PATH", this_root .. "/include", ":") prepend_path("CPLUS_INCLUDE_PATH", this_root .. "/include", ":") prepend_path("PKG_CONFIG_PATH", this_root .. "/lib/pkgconfig", ":") - setenv("RESSENTIALS_ROOT", this_root) + setenv("ANACONDA_ROOT", this_root) -Or adding a tcl modulefile similar to the above tcl template for Anaconda. +Or adding a tcl modulefile similar to the above tcl template for +Anaconda. Julia kernel ------------- +~~~~~~~~~~~~ The ``interactive jupyter -k julia`` command provides Julia from a JupyterLab notebook. Julia can be installed from `Spack `__, `source `__ or `Anaconda `__. The following -shows how to install Julia from Anaconda (Note that if Julia have -been installed on the cluster, you can skip this section and use the +shows how to install Julia from Anaconda (Note that if Julia have been +installed on the cluster, you can skip this section and use the available Julia module instead). .. code:: bash @@ -164,8 +171,8 @@ available Julia module instead). ./bin/conda create -n julia- -c conda-forge julia In the above lines, ```` and ```` -should be updated based on the Anaconda path and ```` -(e.g. ``1.6.1``) based on the version of Julia in the env. +should be updated based on the Anaconda path and version, and +```` (e.g. ``1.6.1``) based on the version of Julia in the env. The following modulefile should be added to ``$MODULEPATH/julia/.lua``: @@ -191,15 +198,17 @@ The following modulefile should be added to prepend_path("C_INCLUDE_PATH", this_root .. "/include", ":") prepend_path("CPLUS_INCLUDE_PATH", this_root .. "/include", ":") prepend_path("PKG_CONFIG_PATH", this_root .. "/lib/pkgconfig", ":") - setenv("JULIA_ROOT", this_root) + setenv("ANACONDA_ROOT", this_root) -Or adding a tcl modulefile similar to the above tcl template for Anaconda. +Or adding a tcl modulefile similar to the above tcl template for +Anaconda. -Note that the first time that users run ``interactive jupyter -k julia``, -Julia Jupyter kernal (IJulia) will be installed under ``~/.julia``. +Note that the first time that users run +``interactive jupyter -k julia``, Julia Jupyter kernal (IJulia) will be +installed under ``~/.julia``. On demand Python and R pakages ------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Popular Python pakages that are not available in Anaconda can be added to ``interactive jupyter -e``. For instance the following shows how to @@ -215,11 +224,12 @@ Similarly, we can create a PyTorch (PT) env with: .. code:: bash - cd //anaconda/ + cd //anaconda/ ./bin/conda create -n pytorch- anaconda ./bin/conda install -n pytorch- -c pytorch pytorch gpustat -For instance, we can collect set of popular R bio packages in the following env from bioconda channel: +For instance, we can collect popular R bio packages in the following env +from bioconda channel: .. code:: bash @@ -227,8 +237,8 @@ For instance, we can collect set of popular R bio packages in the following env ./bin/conda create -n r-bioessentials- -c bioconda -c conda-forge bioconductor-edger bioconductor-oligo r-monocle3 r-signac r-seurat scanpy macs2 jupyterlab r-irkernel In the above lines, ```` and ```` -should be updated based on the Anaconda path and ```` -(e.g. ``2.4.1``) based on the version of TF, PT, and R. +should be updated based on the Anaconda path and version, and +```` (e.g. ``2.4.1``) based on the version of TF, PT, or R. For each env, we need to add a modulefile to ``$MODULEPATH//.lua``. For instance @@ -255,12 +265,21 @@ For each env, we need to add a modulefile to prepend_path("C_INCLUDE_PATH", this_root .. "/include", ":") prepend_path("CPLUS_INCLUDE_PATH", this_root .. "/include", ":") prepend_path("PKG_CONFIG_PATH", this_root .. "/lib/pkgconfig", ":") - setenv("TENSORFLOW_ROOT", this_root) + setenv("ANACONDA_ROOT", this_root) + +Or adding a tcl modulefile similar to the above tcl template for +Anaconda. -Or adding a tcl modulefile similar to the above tcl template for Anaconda. +**Note**: Users can add other packages and mix a local stack of packages +with the premade environments. For Python and R packages users can apply +``pip install`` and ``install.packages`` respectively to install +packages on their home. In order to install packages in a differnt path +than home, we can specify the desired path and add the new path to the +library path of the software. See examples under the ``interactive`` +command line options examples. Configuration -============= +------------- The ``sbox`` and ``interactive`` commands are reading the required information from the below JSON config file. @@ -281,21 +300,25 @@ information from the below JSON config file. The config file includes: -- ``disk_quota_paths``: A list of paths to the disks for finding users - quotas. By default the first input is considered as the users’ home path. +- ``disk_quota_paths``: A list of paths to the disk for finding users + quotas. By default the first input is considered as the users’ home + path. - ``cpu_partition``: A list of computational partitions. - ``gpu_partition``: A list of GPU partitions. - ``interactive_partition_timelimit``: A dictionary of interactive partitions (i.e. users should access by ``srun``) and their time - limits (hour). The first input is considered as the default partition. + limits (hour). The first input is considered as the default + partition. - ``jupyter_partition_timelimit``: A dictionary of computational/gpu partitions that users can run Jupter servers interactively and their - time limits (hour). The first input is considered as the default partition. + time limits (hour). The first input is considered as the default + partition. - ``partition_qos``: A dictionary of partitions and the corresponding quality of services. -- ``kernel_module``: A dictionary of kernels and the corresponding modules. - A Python kernel is required (review `here `__). -- ``env_module``: A dictionary of virtual environments and the corresponding modules. +- ``kernel_module``: A dictionary of kernels and the corresponding + modules. A Python kernel is required (review the Requirments). +- ``env_module``: A dictionary of virtual environments and the + corresponding modules. For example: @@ -306,23 +329,23 @@ For example: "cpu_partition": ["Interactive","Lewis","Serial","Dtn","hpc3","hpc4","hpc4rc","hpc5","hpc6","General","Gpu"], "gpu_partition": ["Gpu","gpu3","gpu4"], "interactive_partition_timelimit": { - "Interactive": 4, - "Dtn": 4, - "Gpu": 2 + "Interactive": 4, + "Dtn": 4, + "Gpu": 2 }, "jupyter_partition_timelimit": { - "Lewis": 8, - "hpc4": 8, - "hpc5": 8, - "hpc6": 8, - "gpu3": 8, - "gpu4": 8, - "Gpu": 2 + "Lewis": 8, + "hpc4": 8, + "hpc5": 8, + "hpc6": 8, + "gpu3": 8, + "gpu4": 8, + "Gpu": 2 }, "partition_qos": { - "Interactive": "interactive", - "Serial": "seriallong", - "Dtn": "dtn" + "Interactive": "interactive", + "Serial": "seriallong", + "Dtn": "dtn" }, "kernel_module": { "python": "anaconda", @@ -330,9 +353,9 @@ For example: "julia": "julia" }, "env_module": { - "tensorflow-v1.9": "tensorflow/1.9.0", - "tensorflow": "tensorflow", - "pytorch": "pytorch", - "r-bio": "r-bioessentials" + "tensorflow-v1.9": "tensorflow/1.9.0", + "tensorflow": "tensorflow", + "pytorch": "pytorch", + "r-bio": "r-bioessentials" } } diff --git a/docs/sbox.rst b/docs/sbox.rst index 721f0f3..2b580d2 100644 --- a/docs/sbox.rst +++ b/docs/sbox.rst @@ -1,5 +1,5 @@ Sbox -===== +---- ``sbox`` command includes various Slurm commands at one place. Users can use different options to find the information about the cluster and @@ -11,7 +11,7 @@ the cluster via ssh without asking for the passphrase (you need the passphrase to start the ssh-agent). Command line options --------------------- +~~~~~~~~~~~~~~~~~~~~ - ``-h, --help``: Show the help message and exit. - ``-a, --account``: Return user’s Slurm accounts by using Slurm @@ -46,16 +46,16 @@ Command line options command. - ``--running``: Return user’s running jobs by using Slurm ``squeue`` command. -- ``--cancel``: Cancel jobs by a single ID or a comma separated list of - IDs using Slurm ``scancel`` command. +- ``--cancel``: Cancel jobs by a single ID or a comma separated list of + IDs using Slurm ``scancel`` command. - ``--qos``: Show user’s quality of services (QOS) and a list of available QOS in the cluster. It uses Slurm ``sacctmgr show assoc`` - command and returns empty output if the cluster does not use Slurm for - users’ account management. -- ``--quota``: Return user’s disk quotas. It uses ``lfs quota`` - command for LFS systems and Unix ``df`` command for NFS systems. It - returns pooled size of the disk if the cluster does not have - user/group storage accounts. + command and returns empty output if the cluster does not use Slurm + for users’ account management. +- ``--quota``: Return user’s disk quotas. It uses ``lfs quota`` command + for LFS systems and Unix ``df`` command for NFS systems. It returns + pooled size of the disk if the cluster does not have user/group + storage accounts. - ``--ncpu``: Show number of available cpus on the cluster using Slurm ``sinfo`` command. - ``--ncgu``: Show number of available gpus on the cluster using Slurm @@ -67,10 +67,11 @@ Command line options - ``--reserve``: Show Slurm reservations using Slurm ``scontrol`` command. - ``--topusage``: Show top usage users using Slurm ``sreport`` command. -- ``--whodat``: Show users informations by UID. It uses ``ldapsearch`` - command and returns empty output if the cluster does not use LDAP. -- ``--whodat2``: Show users informations by name. It uses ``ldapsearch`` - command and returns empty output if the cluster does not use LDAP. +- ``--whodat``: Show users informations by UID. It uses ``ldapsearch`` + command and returns empty output if the cluster does not use LDAP. +- ``--whodat2``: Show users informations by name. It uses + ``ldapsearch``\ command and returns empty output if the cluster does + not use LDAP. - ``--agent``: Start, stop and list user’s ssh-agents on the current host. It requires one of the start/stop/list options as an argument. Use ``ssh -o StrictHostKeyChecking=no`` to disable asking for host @@ -83,16 +84,16 @@ Jobs histoty: .. code:: bash [user@lewis4-r630-login-node675 ~]$ sbox --hist day - -------------------------------------------------------------------------------- Jobs History - Last Day -------------------------------------------------------------------------------- - JobID User Account State Partition QOS NCPU NNod ReqMem Submit Reserved Start Elapsed End NodeList JobName - ---------- ------ ------- ---------- --------- ------- ---- ---- ------ ------------------- ---------- ------------------- ---------- ------------------- -------------------- ---------- - 23126125 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:05 00:00:00 2021-07-28T01:25:05 00:00:03 2021-07-28T01:25:08 lewis4-c8k-hpc2-nod+ bash - 23126126 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:13 00:00:00 2021-07-28T01:25:13 00:00:03 2021-07-28T01:25:16 lewis4-c8k-hpc2-nod+ bash - 23126127 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:20 00:00:00 2021-07-28T01:25:20 00:00:08 2021-07-28T01:25:28 lewis4-c8k-hpc2-nod+ bash - 23126128 user genera+ COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:49 00:00:00 2021-07-28T01:25:49 00:00:03 2021-07-28T01:25:52 lewis4-c8k-hpc2-nod+ bash - 23126129 user genera+ COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:26:05 00:00:00 2021-07-28T01:26:05 00:00:06 2021-07-28T01:26:11 lewis4-c8k-hpc2-nod+ bash - 23126130 user genera+ COMPLETED Gpu normal 1 1 2Gn 2021-07-28T01:26:38 00:00:02 2021-07-28T01:26:40 00:00:11 2021-07-28T01:26:51 lewis4-z10pg-gpu3-n+ bash - 23126131 user genera+ CANCELLED+ Gpu normal 1 1 2Gn 2021-07-28T01:27:43 00:00:01 2021-07-28T01:27:44 00:01:03 2021-07-28T01:28:47 lewis4-z10pg-gpu3-n+ jupyter-py + -------------------------------------------------------------------------------- Jobs History - Last Day --------------------------------------------------------------------- + JobID User Account State Partition QOS NCPU NNod ReqMem Submit Reserved Start Elapsed End NodeList + ---------- ------ ------- ---------- --------- ------- ---- ---- ------ ------------------- ---------- ------------------- ---------- ------------------- -------------------- + 23126125 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:05 00:00:00 2021-07-28T01:25:05 00:00:03 2021-07-28T01:25:08 lewis4-c8k-hpc2-nod+ + 23126126 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:13 00:00:00 2021-07-28T01:25:13 00:00:03 2021-07-28T01:25:16 lewis4-c8k-hpc2-nod+ + 23126127 user general COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:20 00:00:00 2021-07-28T01:25:20 00:00:08 2021-07-28T01:25:28 lewis4-c8k-hpc2-nod+ + 23126128 user genera+ COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:25:49 00:00:00 2021-07-28T01:25:49 00:00:03 2021-07-28T01:25:52 lewis4-c8k-hpc2-nod+ + 23126129 user genera+ COMPLETED Interact+ intera+ 1 1 2Gn 2021-07-28T01:26:05 00:00:00 2021-07-28T01:26:05 00:00:06 2021-07-28T01:26:11 lewis4-c8k-hpc2-nod+ + 23126130 user genera+ COMPLETED Gpu normal 1 1 2Gn 2021-07-28T01:26:38 00:00:02 2021-07-28T01:26:40 00:00:11 2021-07-28T01:26:51 lewis4-z10pg-gpu3-n+ + 23126131 user genera+ CANCELLED+ Gpu normal 1 1 2Gn 2021-07-28T01:27:43 00:00:01 2021-07-28T01:27:44 00:01:03 2021-07-28T01:28:47 lewis4-z10pg-gpu3-n+ Jobs efficiency for running and compeleted jobs: @@ -100,10 +101,10 @@ Jobs efficiency for running and compeleted jobs: [user@lewis4-r630-login-node675 ~]$ sbox --eff 23227816 ------------------------------------- Job Efficiency ------------------------------------- - PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND - 47262 user 20 0 115700 3888 1600 S 0.0 0.0 0:00.03 bash - 47346 user 20 0 113292 149298 1256 S 99.0 23.0 0:13.30 python - + PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND + 47262 user 20 0 115700 3888 1600 S 0.0 0.0 0:00.03 bash + 47346 user 20 0 113292 149298 1256 S 99.0 23.0 0:13.30 python + RES: shows resident memory which is accurate representation of how much actual physical memory a process is consuming %CPU: shows the percentage of the CPU that is being used by the process @@ -114,7 +115,7 @@ Jobs efficiency for running and compeleted jobs: Job ID: 23126131 Cluster: lewis4 User/Group: user/user - State: CANCELLED (exit code 0) + State: COMPLETED (exit code 0) Cores: 1 CPU Utilized: 00:11:01 CPU Efficiency: 48.59% of 00:21:03 core-walltime @@ -191,43 +192,48 @@ Cluster resources: Partition Gpu has 383 cpus available out of 412 (93%) Interactive -============ +----------- ``interactive`` is an alias for using cluster interactively using Slurm ``srun`` and ``sbatch`` commands. The ``interactive jupyter`` provides a JupyterLab interface for using scientific software including Python, R, -Julia, and their libraries. The command submits a batch file and runs a -Jupyter server on the cluster. Multiple kernels and environments can be -applied to use different software and packages in JupyterLab. +Julia, and their libraries. The command submits a batch file by +``sbatch`` command and runs a Jupyter server on the cluster. Multiple +kernels and environments can be applied to use different software and +packages in JupyterLab. + +.. _command-line-options-1: Command line options --------------------- +~~~~~~~~~~~~~~~~~~~~ - ``-h, --help``: Show this help message and exit. -- ``-A, --account``: Slurm account name or project ID. +- ``-a, --account``: Slurm account name or project ID. - ``-n, --ntasks``: Number of tasks (cpus). - ``-N, --nodes``: Number of nodes. - ``-p, --partition``: Partition name. -- ``-t, --time``: Number of hours based on the partitions time limits. +- ``-t, --time``: Number of hours based on the partitions timelimit. - ``-l, --license``: Add a license to an interactive session. -- ``-m, --mem``: Amount of memory per GB. +- ``-m, --mem``: Amount of memory (per GB). - ``-g, --gpu``: Number of gpus. -- ``-k, --kernel``: Jupyter kernel for python, r, julia. The default kernel is python. -- ``-e, --environment``: Virtual environment(s) for a JupyterLab session. -- ``-y , --myenv``: Path to a local virtual environment. The local virtual envs should contain JupyterLab. +- ``-k, --kernel``: Jupyter kernel for python, r, julia. The default + kernel is python. +- ``-e, --environment``: Virtual environment(s) for a JupyterLab + session. +- ``-E, --myenv``: Path to a local virtual environment. The local + virtual envs should contain JupyterLab. **Examples** -Use the cluster interactively: +Using the cluster interactively: .. code:: bash - [user@lewis4-r630-login-node675 bin]$ module load sbox [user@lewis4-r630-login-node675 ~]$ interactive Logging into Interactive partition with 2G memory, 1 cpu for 2 hours ... [user@lewis4-r7425-htc5-node835 ~]$ -Use the cluster interactively with more time and resources: +Using the cluster interactively with more time and resources: .. code:: bash @@ -235,7 +241,7 @@ Use the cluster interactively with more time and resources: Logging into Interactive partition with 16G memory, 6 cpu for 4 hours ... [user@lewis4-r7425-htc5-node835 ~]$ -Use the cluster interactively with a license: +Using the cluster interactively with a license: .. code:: bash @@ -243,7 +249,7 @@ Use the cluster interactively with a license: Logging into Interactive partition with 16G memory, 6 cpu for 4 hours with a matlab license ... [user@lewis4-r7425-htc5-node835 ~]$ -Use a Gpu interactively: +Using a Gpu interactively: .. code:: bash @@ -251,7 +257,7 @@ Use a Gpu interactively: Logging into Gpu partition with 1 gpu, 2G memory, 1 cpu for 2 hours ... [user@lewis4-r730-gpu3-node431 ~]$ -Use JupyterLab: +Using JupyterLab: .. code:: bash @@ -272,25 +278,57 @@ Use JupyterLab: To stop the server run the following on the cluster: scancel 23150533 -Use TensorFlow with JupyterLab: +Using JupyterLab with R kernel: .. code:: bash - [user@lewis4-r630-login-node675 ~]$ interactive jupyter -A general-gpu -p gpu3 --mem 16 -t 8 -e tensorflow - Logging into gpu3 partition with 1 gpu, 16G memory, 1 cpu for 8 hours with account general-gpu ... + [user@lewis4-r630-login-node675 ~]$ interactive jupyter -k r + Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... Starting Jupyter server (it might take about a couple minutes) ... Starting Jupyter server ... Starting Jupyter server ... ... -Use R with JupyterLab: +Using TensorFlow on JupyterLab by a different account and on a partition +with 16 GB memory for 8 hours: .. code:: bash - interactive jupyter -k r - Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... + [user@lewis4-r630-login-node675 ~]$ interactive jupyter -a general-gpu -p gpu3 --mem 16 -t 8 -e tensorflow + Logging into gpu3 partition with 1 gpu, 16G memory, 1 cpu for 8 hours with account general-gpu ... Starting Jupyter server (it might take about a couple minutes) ... Starting Jupyter server ... Starting Jupyter server ... ... +**Note**: Users can install other packages and mix local packages with +the premade environments. For example, for Python: + +.. code:: bash + + pip install --target + export PYTHONPATH=:$PYTHONPATH + +For R, run the following in R: + +.. code:: r + + dir.create("") + install.packages("", repos = "http://cran.us.r-project.org", lib = "") + .libPaths("") + +Using a local virtual environment: + +.. code:: bash + + [user@lewis4-r630-login-node675 ~]$ interactive jupyter -E + Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... + Starting Jupyter server (it might take about a couple minutes) ... + Starting Jupyter server ... + +**Note**: The local environments must include ``jupyterlab``. For R +environments, they must also contain ``r-irkernel``. For instance: + +.. code:: bash + + conda create -p -c conda-forge r-base jupyterlab r-irkernel diff --git a/share/man/man1/interactive.1 b/share/man/man1/interactive.1 index 24cda12..688a124 100644 --- a/share/man/man1/interactive.1 +++ b/share/man/man1/interactive.1 @@ -2,17 +2,25 @@ .SH NAME interactive \- an alias for using cluster interactively .SH SYNOPSIS -.B interactive -[\fB-h\fR] [\fB-A\fR] [\fB-n\fR] [\fB-N\fR] [\fB-p\fR] [\fB-t\fR] [\fB-l\fR] [\fB-m\fR] [\fB-g\fR] [{jupyter}] +interactive [-h] [-a] [-n] [-N] [-p] [-t] [-k] [-e] [-E] [-l] [-m] [-g] [{jupyter}] .br .SH DESCRIPTION -Interactive command uses Slurm srun and sbatch commands to request resources interactively including running a Jupyter server on the cluster. Multiple kernels and environments can be applied to use different software and packages in JupyterLab. +.PP +interactive is an alias for using cluster interactively using +Slurm srun and sbatch commands. +The interactive jupyter provides a JupyterLab interface for +using scientific software including Python, R, Julia, and their +libraries. +The command submits a batch file by sbatch command and runs a +Jupyter server on the cluster. +Multiple kernels and environments can be applied to use different +software and packages in JupyterLab. .SH COMMAND LINE OPTIONS .TP -.B -h, --help +.B -h, --help Show this help message and exit. .TP -.B -A, --account +.B -a, --account Slurm account name or project ID. .TP .B -n, --ntasks @@ -25,29 +33,32 @@ Number of nodes. Partition name. .TP .B -t, --time -Number of hours based on the partitions time limits. +Number of hours based on the partitions timelimit. .TP .B -l, --license Add a license to an interactive session. .TP .B -m, --mem -Amount of memory per GB. +Amount of memory (per GB). .TP .B -g, --gpu Number of gpus. .TP .B -k, --kernel -Jupyter kernel for python, r, julia. The default kernel is python. +Jupyter kernel for python, r, julia. +The default kernel is python. .TP .B -e, --environment -Virtual environment(s) for a JupyterLab session. +Virtual environment(s) for a JupyterLab +session. .TP -.B -y, --myenv -Path to a local virtual environment. The local virtual envs should contain JupyterLab. +.B -E, --myenv +Path to a local virtual environment. +The local virtual envs should contain JupyterLab. .PP -\f[B]Examples +Examples .PP -Use the cluster interactively: +Using the cluster interactively: .IP .nf [user\[at]lewis4-r630-login-node675 \[ti]]$ interactive @@ -55,7 +66,7 @@ Logging into Interactive partition with 2G memory, 1 cpu for 2 hours ... [user\[at]lewis4-r7425-htc5-node835 \[ti]]$ .fi .PP -Use the cluster interactively with more time and resources: +Using the cluster interactively with more time and resources: .IP .nf [user\[at]lewis4-r630-login-node675 \[ti]]$ interactive --mem 16 -n 6 -t 4 @@ -63,7 +74,7 @@ Logging into Interactive partition with 16G memory, 6 cpu for 4 hours ... [user\[at]lewis4-r7425-htc5-node835 \[ti]]$ .fi .PP -Use the cluster interactively with a license: +Using the cluster interactively with a license: .IP .nf [user\[at]lewis4-r630-login-node675 \[ti]]$ interactive --mem 16 -n 6 -t 4 -l matlab @@ -71,7 +82,7 @@ Logging into Interactive partition with 16G memory, 6 cpu for 4 hours with a mat [user\[at]lewis4-r7425-htc5-node835 \[ti]]$ .fi .PP -Use a Gpu interactively: +Using a Gpu interactively: .IP .nf [user\[at]lewis4-r630-login-node675 \[ti]]$ interactive -p Gpu @@ -79,7 +90,7 @@ Logging into Gpu partition with 1 gpu, 2G memory, 1 cpu for 2 hours ... [user\[at]lewis4-r730-gpu3-node431 \[ti]]$ .fi .PP -Use JupyterLab: +Using JupyterLab: .IP .nf [user\[at]lewis4-r630-login-node675 \[ti]]$ interactive jupyter @@ -87,40 +98,72 @@ Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... Starting Jupyter server (it might take about a couple minutes) ... Starting Jupyter server ... Starting Jupyter server ... - Jupyter Notebook is running. - Open a new terminal in your local computer and run: ssh -NL 8888:lewis4-r630-hpc4-node303:8888 user\[at]lewis.rnet.missouri.edu - After that open a browser and go: http://127.0.0.1:8888/?token=9e223bd179d228e0e334f8f4a85dfd904eebd0ab9ded7e55 - To stop the server run the following on the cluster: scancel 23150533 .fi .PP -Use TensorFlow with JupyterLab: +Using JupyterLab with R kernel: .IP .nf -[user\[at]lewis4-r630-login-node675 \[ti]]$ interactive jupyter -A general-gpu -p gpu3 --mem 16 -t 8 -e tensorflow -Logging into gpu3 partition with 1 gpu, 16G memory, 1 cpu for 8 hours with account general-gpu ... +[user\[at]lewis4-r630-login-node675 \[ti]]$ interactive jupyter -k r +Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... Starting Jupyter server (it might take about a couple minutes) ... Starting Jupyter server ... Starting Jupyter server ... \&... .fi .PP -Use R with JupyterLab: +Using TensorFlow on JupyterLab by a different account and on a partition +with 16 GB memory for 8 hours: .IP .nf -interactive jupyter -k r -Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... +[user\[at]lewis4-r630-login-node675 \[ti]]$ interactive jupyter -a general-gpu -p gpu3 --mem 16 -t 8 -e tensorflow +Logging into gpu3 partition with 1 gpu, 16G memory, 1 cpu for 8 hours with account general-gpu ... Starting Jupyter server (it might take about a couple minutes) ... Starting Jupyter server ... Starting Jupyter server ... \&... .fi +.PP +Note: Users can install other packages and mix local packages +with the premade environments. +For example, for Python: +.IP +.nf +pip install --target +export PYTHONPATH=:$PYTHONPATH +.fi +.PP +For R, run the following in R: +.IP +.nf +dir.create(\[dq]\[dq]) +install.packages(\[dq]\[dq], repos = \[dq]http://cran.us.r-project.org\[dq], lib = \[dq]\[dq]) +\&.libPaths(\[dq]\[dq]) +.fi +.PP +Using a local virtual environment: +.IP +.nf +[user\[at]lewis4-r630-login-node675 \[ti]]$ interactive jupyter -E +Logging into Lewis partition with 2G memory, 1 cpu for 2 hours ... +Starting Jupyter server (it might take about a couple minutes) ... +Starting Jupyter server ... +.fi +.PP +Note: The local environments must include +jupyterlab. +For R environments, they must also contain r-irkernel. +For instance: +.IP +.nf +conda create -p -c conda-forge r-base jupyterlab r-irkernel +.fi .SH AUTHOR Ashkan Mirzaee: https://ashki23.github.io/ .SH INTERNET RESOURCES @@ -130,7 +173,6 @@ Documentation: https://sbox.readthedocs.io/ Downloads: https://github.com/ashki23/sbox/releases/latest .br Module repository: https://github.com/ashki23/sbox - .SH LICENSING Sbox is distributed under an Open Source license. See the file "LICENSE" in the source distribution for information on terms & diff --git a/share/man/man1/sbox.1 b/share/man/man1/sbox.1 index dec0fae..0c9f398 100644 --- a/share/man/man1/sbox.1 +++ b/share/man/man1/sbox.1 @@ -2,19 +2,15 @@ .SH NAME sbox \- a simple toolbox for Slurm .SH SYNOPSIS -.B sbox -[\fB-h\fR] [\fB-a\fR] [\fB-f\fR] [\fB-g\fR] [\fB-q\fR] [\fB-j\fR JOBID] [\fB-c\fR] [\fB-p\fR] [\fB-u\fR UID] [\fB-v\fR] -[\fB--eff\fR JOBID] [\fB--history\fR {day,week,month,year}] [\fB--pending\fR] -[\fB--running\fR] [\fB--qos\fR] [\fB--quota\fR] [\fB--ncpu\fR] [\fB--ngpu\fR] [\fB--gpu\fR] -[\fB--license\fR] [\fB--reserve\fR] [\fB--topusage\fR] [\fB--whodat\fR UID] -[\fB--whodat2\fR UNAME] [\fB--agent\fR {start,stop,list}] +sbox [-h] [-a] [-f] [-g] [-q] [-j JOBID] [-c] [-p] [-u UID] [-v] [--eff JOBID] [--history {day,week,month,year}] [--pending] [--running] [--cancel JOBID] [--qos] [--quota] [--ncpu] [--ngpu] [--gpu] [--license] [--reserve] [--topusage] [--whodat UID] [--whodat2 UNAME] [--agent {start,stop,list}] .br .SH DESCRIPTION +.PP sbox command includes various Slurm commands at one place. Users can use different options to find the information about the cluster and their accounts and activities. Beyond the Slurm commands, sbox provides some Unix features -including users groups, disk quotas and starting ssh agents. +including users\[cq] groups, disk quotas and starting ssh agents. The ssh-agent lets users communicate with clients outside the cluster such as GitHub and GitLab or with other nodes within the cluster via ssh without asking for the passphrase (you need the passphrase to start the @@ -25,83 +21,132 @@ ssh-agent). Show the help message and exit. .TP .B -a, --account -Return users Slurm accounts by using Slurm sacctmgr. If the cluster does not use Slurm for users account management, it returns empty output. +Return user\[cq]s Slurm accounts by using Slurm +sacctmgr. +If the cluster does not use Slurm for users\[cq] account management, it +returns empty output. .TP -.B -f, --fairshare -Return users fairshare by using Slurm sshare command. If the cluster does not follow a fairshare model, it returns empty output. +.B -f, --fairshare +Return users\[cq] fairshare by using Slurm +sshare command. +If the cluster does not follow a fairshare model, it returns empty +output. .TP -.B -g, --group -Return users posix groups by using Unix groups command. +.B -g, --group +Return user\[cq]s posix groups by using Unix +groups command. .TP -.B -q, --queue -Return users jobs in the Slurm queue by Slurm using squeue command. +.B -q, --queue +Return user\[cq]s jobs in the Slurm queue by +Slurm using squeue command. .TP -.B -j, --job -Show a running/pending job info by using Slurm scontrol command. It requires a valid job ID as argument. +.B -j, --job +Show a running/pending job info by using Slurm +scontrol command. +It requires a valid job ID as argument. .TP -.B -c, --cpu -Return computational resources including number of cores and amount of memory on each node. It uses Slurm sjstat command. +.B -c, --cpu +Return computational resources including number of +cores and amount of memory on each node. +It uses Slurm sjstat command. .TP -.B -p, --partition -Show cluster partitions by using Slurm sinfo command. +.B -p, --partition +Show cluster partitions by using Slurm +sinfo command. .TP -.B -u, --user -Store a user ID. By default it uses $USER as user ID for any query that needs a user ID. It can be used with other options to find the information for other users. +.B -u, --user +Store a user ID. +By default it uses $USER as user ID for any query that needs a +user ID. +It can be used with other options to find the information for other +users. .TP -.B -v, --version -Show programs version number and exit. +.B -v, --version +Show program\[cq]s version number and exit. .TP .B --eff -Show efficiency of a job. It requires a valid job ID as argument. It uses Slurm seff command for completed/finished jobs and Unix top command for a running job. +Show efficiency of a job. +It requires a valid job ID as argument. +It uses Slurm seff command for completed/finished jobs and +Unix top command for a running job. .TP -.B --history -Return jobs history for last day, week, month or year. It requires one of the day/week/month/year options as an argument. It uses Slurm sacct command and return empty output if the cluster does not use Slurm for users account management. +.B --history +Return jobs history for last day, week, month or +year. +It requires one of the day/week/month/year options as an argument. +It uses Slurm sacct command and returns empty output if the +cluster does not use Slurm for users\[cq] account management. .TP -.B --pending -Return users pending jobs by using Slurm squeue command. +.B --pending +Return user\[cq]s pending jobs by using Slurm +squeue command. .TP -.B --running -Return users running jobs by using Slurm squeue command. +.B --running +Return user\[cq]s running jobs by using Slurm +squeue command. .TP .B --cancel -Cancel jobs by a single ID or a comma separated list of IDs using Slurm scancel command. +Cancel jobs by a single ID or a comma separated list +of IDs using Slurm scancel command. .TP -.B --qos -Show users quality of services (QOS) and a list of available QOS in the cluster. It uses Slurm sacctmgr show assoc command and return empty output if the cluster does not use Slurm for users account management. +.B --qos +Show user\[cq]s quality of services (QOS) and a list of +available QOS in the cluster. +It uses Slurm sacctmgr show assoc command and returns empty +output if the cluster does not use Slurm for users\[cq] account +management. .TP -.B --quota -Return users disk quotas. It uses lfs quota command for LFS systems and Unix df command for NFS systems. It returns pooled size of the disk if the cluster does not have user/group storage accounts. +.B --quota +Return user\[cq]s disk quotas. +It uses lfs quota command for LFS systems and Unix +df command for NFS systems. +It returns pooled size of the disk if the cluster does not have +user/group storage accounts. .TP -.B --ncpu -Show number of available cpus on the cluster using Slurm sinfo command. +.B --ncpu +Show number of available cpus on the cluster using +Slurm sinfo command. .TP -.B --ncgu -Show number of available gpus on the cluster using Slurm squeue and sinfo commands. +.B --ncgu +Show number of available gpus on the cluster using +Slurm squeue and sinfo commands. .TP -.B --gpu -Show gpu resources including gpu cards name and numbers using Slurm sinfo command. +.B --gpu +Show gpu resources including gpu cards\[cq] name and +numbers using Slurm sinfo command. .TP -.B --license -Show available license servers using Slurm scontrol command. +.B --license +Show available license servers using Slurm +scontrol command. .TP -.B --reserve -Show Slurm reservations using Slurm scontrol command. +.B --reserve +Show Slurm reservations using Slurm +scontrol command. .TP -.B --topusage -Show top usage users using Slurm sreport command. +.B --topusage +Show top usage users using Slurm sreport +command. .TP .B --whodat -Show users informations by UID. It uses ldapsearch command and returns empty output if the cluster does not use LDAP. +Show users informations by UID. +It uses ldapsearch command and returns empty output if the +cluster does not use LDAP. .TP .B --whodat2 -Show users informations by name. It uses ldapsearch command and returns empty output if the cluster does not use LDAP. +Show users informations by name. +It uses ldapsearchcommand and returns empty output if the +cluster does not use LDAP. .TP -.B --agent -Start, stop and list users ssh-agents on the current host. It requires one of the start/stop/list options as an argument. Use ssh -o StrictHostKeyChecking=no to disable asking for host key acceptances. +.B --agent +Start, stop and list user\[cq]s ssh-agents on the +current host. +It requires one of the start/stop/list options as an argument. +Use ssh -o StrictHostKeyChecking=no to disable asking for host +key acceptances. .PP -\f[B]Examples +Examples .PP -Jobs histoty +Jobs histoty: .IP .nf [user\[at]lewis4-r630-login-node675 \[ti]]$ sbox --hist day @@ -117,7 +162,17 @@ Jobs histoty 23126131 user genera+ CANCELLED+ Gpu normal 1 1 2Gn 2021-07-28T01:27:43 00:00:01 2021-07-28T01:27:44 00:01:03 2021-07-28T01:28:47 lewis4-z10pg-gpu3-n+ .fi .PP -Jobs efficiency: +Jobs efficiency for running and compeleted jobs: +.IP +.nf +[user\[at]lewis4-r630-login-node675 \[ti]]$ sbox --eff 23227816 +------------------------------------- Job Efficiency ------------------------------------- + PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND + 47262 user 20 0 115700 3888 1600 S 0.0 0.0 0:00.03 bash + 47346 user 20 0 113292 149298 1256 S 99.0 23.0 0:13.30 python +RES: shows resident memory which is accurate representation of how much actual physical memory a process is consuming +%CPU: shows the percentage of the CPU that is being used by the process +.fi .IP .nf [user\[at]lewis4-r630-login-node675 \[ti]]$ sbox --eff 23126131 @@ -125,12 +180,12 @@ Jobs efficiency: Job ID: 23126131 Cluster: lewis4 User/Group: user/user -State: CANCELLED (exit code 0) +State: COMPLETED (exit code 0) Cores: 1 -CPU Utilized: 00:00:01 -CPU Efficiency: 1.59% of 00:01:03 core-walltime -Memory Utilized: 45.80 MB -Memory Efficiency: 2.24% of 2.00 GB +CPU Utilized: 00:11:01 +CPU Efficiency: 48.59% of 00:21:03 core-walltime +Memory Utilized: 445.80 MB +Memory Efficiency: 24.24% of 2.00 GB .fi .PP Accounts, fairshares, and groups: @@ -139,7 +194,6 @@ Accounts, fairshares, and groups: [user\[at]lewis4-r630-login-node675 \[ti]]$ sbox -afg ---------------------------------------- Accounts ---------------------------------------- rcss-gpu root general-gpu rcss general - --------------------------------------- Fairshare ---------------------------------------- Account User RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- @@ -148,7 +202,6 @@ general-gpu user 1 0.000005 3942 0.000016 rcss user 1 0.001391 1327 0.001147 0.564645 general user 1 0.000096 3196356 0.000243 0.174309 rcss-gpu user 1 0.000181 0 0.000000 0.999976 - ----------------------------------------- Groups ----------------------------------------- user : user rcss gaussian biocompute rcsslab-group rcss-maintenance rcss-cie software-cache .fi @@ -175,6 +228,32 @@ Jobs in the queue: JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 23150514 Lewis jupyter- user R 5:29 1 lewis4-r630-hpc4-node537 .fi +.PP +Cluster resources: +.IP +.nf +[user\[at]lewis4-r630-login-node675 \[ti]]$ sbox --ngpu +------------------------------------- Number of GPUs ------------------------------------- +Partition Gpu has 19 gpus available out of 27 (70%) +Partition gpu3 has 15 gpus available out of 15 (100%) +Partition gpu4 has 4 gpus available out of 12 (33%) +.fi +.IP +.nf +[user\[at]lewis4-r630-login-node675 \[ti]]$ sbox --ncpu +------------------------------------- Number of CPUs ------------------------------------- +Partition Interactive has 158 cpus available out of 160 (99%) +Partition Lewis has 161 cpus available out of 2344 (7%) +Partition Serial has 42 cpus available out of 48 (88%) +Partition Dtn has 35 cpus available out of 36 (97%) +Partition hpc3 has 24 cpus available out of 456 (5%) +Partition hpc4 has 79 cpus available out of 1008 (8%) +Partition hpc4rc has 58 cpus available out of 952 (6%) +Partition hpc5 has 70 cpus available out of 1400 (5%) +Partition hpc6 has 0 cpus available out of 2976 (0%) +Partition General has 1837 cpus available out of 7008 (26%) +Partition Gpu has 383 cpus available out of 412 (93%) +.fi .SH AUTHOR Ashkan Mirzaee: https://ashki23.github.io/ .SH INTERNET RESOURCES @@ -184,7 +263,6 @@ Documentation: https://sbox.readthedocs.io/ Downloads: https://github.com/ashki23/sbox/releases/latest .br Module repository: https://github.com/ashki23/sbox - .SH LICENSING Sbox is distributed under an Open Source license. See the file "LICENSE" in the source distribution for information on terms &