Merge branch 'main' into ehuusko

csc-training · Oct 1, 2024 · 4e00c83 · 4e00c83
2 parents 11763df + 401f364
commit 4e00c83
Show file tree

Hide file tree

Showing 29 changed files with 355 additions and 217 deletions.
diff --git a/materials/allas.md b/materials/allas.md
@@ -1,6 +1,6 @@
 # Allas – object storage
 
-What it is?
+What is it?
 
 * Allas is a **storage service**, technically object storage
 * **For CSC project lifetime: 1-5 years**
@@ -13,7 +13,7 @@ What it is?
 * LUMI-O is very similar to Allas
 * [LUMI Docs: LUMI-O](https://docs.lumi-supercomputer.eu/storage/lumio/)
 
-What it is NOT?
+What is it NOT?
 
 - A file system (even though many tools try to fool you to think so). It is just a place to store static data objects.
 - A data management environment. Tools for etc. search, metadata, version control and access management are minimal.
@@ -30,7 +30,7 @@ What it is NOT?
   	- For data organization and access administration
 - Data is stored as **objects** within a bucket
 	- Practically: object = file
-	- In reality, there is no hierarcical directory structure within a bucket, although it sometimes looks like that.
+	- In reality, there is no hierarchical directory structure within a bucket, although it sometimes looks like that.
 		- Object name can be `/data/myfile.zip` and some tools may display it as `data` folder with `myfile.zip` file.
 
 ### Things to consider 
@@ -44,9 +44,8 @@ What it is NOT?
 
 - S3 and SWIFT. 
 	- **For new projects S3 is recommended**
-  	- SWIFT might be soon depricated.
-	- Avoid cross-using SWIFT and S3 based objects!
-
+  	- SWIFT might be soon deprecated.
+	- Avoid cross-using SWIFT- and S3-based objects!
 
 ## Tools for Allas
 

diff --git a/materials/batch_job.md b/materials/batch_job.md
@@ -1,9 +1,18 @@
 # Batch jobs
 
-On our own computer, we are used to starting a program (job) and the program starts instantly. In a supercomputing environment, the computer is **shared among hundreds of other users**.  All heavy computing must be done on compute nodes, see [Usage policy](https://docs.csc.fi/computing/overview/#usage-policy). To use the compute nodes, the user first requests computing resources then waits for access to these resources, and then the job starts.
+On our own computer, we are used to a started program (job) starting
+instantly. In a supercomputing environment, the computer is **shared among
+hundreds of users**. All heavy computing must be done on compute nodes
+(see [Usage policy](https://docs.csc.fi/computing/overview/#usage-policy)). To
+use compute nodes, the user first asks for the computing resources and then
+waits for the job to start when the requested resources become available.
 
 ## SLURM - job management system
-A job management system keeps track of the available and requested computing resources. It aims to share the resources in an efficient and fair way among all users. It optimizes resource usage by filling the compute nodes so that there will be as little idling resources as possible. CSC uses a job management system called SLURM.
+A job management system keeps track of the available and requested computing
+resources. It aims to share the resources in an efficient and fair way among
+all users. It optimizes resource usage by filling the compute nodes so that
+there will be as little idling resources as possible. CSC uses a job
+management system called SLURM.
 
 ```{figure} images/slurm-sketch.svg
 :alt: How batch jobs are distributed on compute nodes in terms of number of CPU cores, time and memory
@@ -12,17 +21,20 @@ A job management system keeps track of the available and requested computing res
 SLURM job allocations
 ```
 
-
-It is important to request only the resources you need and ensure that the resources are used efficiently. Resources allocated to a job are not available for others to use. If a job is _not_ using the cores or memory it reserved, resources are wasted. 
+It is important to request only the resources you need and ensure that the
+resources are used efficiently. Resources allocated to a job are not available
+for others to use. If a job is _not_ using the cores or memory it reserved,
+resources are wasted. 
 
 ## Batch job script
 
 A **batch job script** is used to request resources for a job. It consists of two parts:
 
-* The resource request: computing time, number of cores, amount of memory and other resources like GPUs, local disk, etc.
+* The resource request: computing time, number of cores, amount of memory and
+  other resources like GPUs, local disk, etc.
 * Instructions for computing: what tool or script to run.
 
-Example minimal batch script:
+Minimal example of batch script:
 
 ```bash title="simple.sh"
 #!/bin/bash
@@ -38,58 +50,98 @@ srun python myscript.py             # The script to run
 * Submit the job for computation: `sbatch simple.sh`
 * Cancel a job after job submission during queueing or runtime: `scancel jobid`.
 
-When we submit a batch job script, the job is not started directly, but is sent into a **queue**. Depending on the requested resources and load, the job may need to wait to get started. 
+When we submit a batch job script, the job is not started directly, but is
+sent into a **queue**. Depending on the requested resources and load, the job
+may need to wait to get started. 
 
 :::{admonition} How many resources to request?
 :class: seealso
 
-* If you have run the code on some other machine (your laptop?), as a first guess you can reserve the same amount of CPUs and memory as that machine has.
-* You can also check more closely what resources are used with `top` on Mac and Linux or `task manager` on Windows when running on the other machine.
-* If your program does the same or similar thing more than once, you can estimate the total run time by multiplying the one-time run time by the number of runs.
-* The first resource reservation on supercomputer is often a guess, do not worry too much, just adjust it later.
-* Before reserving multiple CPUs, check if your code can make use them.
-* Before reserving multiple nodes, check if your code can make use them. Most GIS tools can not.
-* When you double the number of cores, the job should run at least 1.5x faster.
-* Some tools run both on CPU and GPU, if unsure which to use, a good rule of thumb is to compare the billing unit (BU) usage and select the one using less. A GPU uses 60 times more billing units than a single CPU core.
-* You should always monitor jobs to find out what were the actual resources you requested.
+* If you have run the code on some other machine (your laptop?), as a first
+  guess, you can reserve the same amount of CPUs and memory as on that
+  machine.
+* You can also monitor resource usage more closely with `top` on Mac and Linux
+  or `task manager` on Windows when running on the other machine.
+* If your program does the same thing (or similar things) more than once, you
+  can estimate the total run time by multiplying the duration of one run with
+  the total number of runs.
+* An initial resource reservation on a supercomputer is often a guess, do not
+  worry too much, just adjust it later.
+* Before reserving multiple CPUs, check if your code can make use of them.
+* Before reserving multiple nodes, check if your code can make use of them.
+  Most GIS tools can not.
+* When you double the number of cores, the job should run at least 1.5x
+  faster.
+* Some tools run on both CPU and GPU. If unsure which to use, a good rule of
+  thumb is to compare the billing unit (BU) usage and select the one consuming
+  fewer units. A GPU uses 60 times more billing units than a single CPU core.
+* You should always monitor jobs to find out what were the actual resources
+  you requested.
 
 Partly adapted from [Aalto Scientific Computing](https://scicomp.aalto.fi/triton/usage/program-size/)
 :::
 
 ## Partitions
 
-A **partition** is a set of compute nodes, grouped logically. Resource limitations for a job are defined by the partition (or queue) the job is submitted to. The limitations affect the **maximum run time, available memory and the number of CPU/GPU cores**. Jobs should be submitted to the smallest partition that matches the required resources. 
+A **partition** is a logically grouped set of compute nodes. Resource
+limitations for a job are defined by the partition (or queue) the job is
+submitted to. The limitations affect the **maximum run time, available memory
+and the number of  CPU/GPU cores**. Jobs should be submitted to the smallest
+partition that matches the required resources. 
 
 - [CSC Docs: Available batch job partitions](https://docs.csc.fi/computing/running/batch-job-partitions/)
 - [LUMI Docs: Slurm particions](https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/partitions/)
 
-
 ## Job types
 
-* **Interactive jobs** for working with some tool interactively, for example graphical tools, writing code, testing. For interactive jobs allocate the resource via the [interactive partition](https://docs.csc.fi/computing/running/interactive-usage/). This way your work is performed in a compute node, not on the login node. Interactive partition is often used for applications in the web interface. The resources are limited in interactive partition, but it should have no or very short queue.
-* **Serial jobs** work on only one task at a time following a sequence of instructions, while only using one core.
-* **Parallel jobs** distribute the work over several cores or nodes in order to achieve a shorter wall time (and/or a larger allocatable memory). 
-* **GPU jobs** for tools that can benefit from running on GPUs. In spatial analysis context, GPUs are most often used for deep learning.
+* **Interactive jobs** are used for e.g. working interactively with
+  tools that have a graphical UI, writing code (using graphical development
+  environments) and testing whether a program runs as intended. For
+  interactive jobs, allocate the resources from the
+  [interactive partition](https://docs.csc.fi/computing/running/interactive-usage/).
+  This way your work is performed on a compute node, not on the login node.
+  The interactive partition is often used for applications in the web interface.
+  The resources are limited on this partition, but it should have very
+  short queuing times.
+* **Serial jobs** work on only one task at a time following a sequence of
+  instructions and only using one core.
+* **Parallel jobs** distribute the work over several cores or nodes in order
+  to achieve a shorter wall time (and/or more allocatable memory). 
+* **GPU jobs** for tools that can benefit from running on GPUs. In a spatial
+  analysis context, GPUs are most often used for deep learning.
 
 :::{admonition} Which partition to choose?
 :class: tip
 
 Check [CSC Docs: Available batch job partitions](https://docs.csc.fi/computing/running/batch-job-partitions/) and find suitable partitions for these tasks:
 
-1. Through trial and error Anna has determined that her image processing process takes about 60 min and 16 GB of memory. 
-2. Laura has profiled her code, and determined that it can run efficiently on 20 cores with 12 GB of memory each. The complete process should be done within 4 days.
+1. Through trial and error, Anna has determined that her image processing task
+   takes about 60 min and 16 GB of memory. 
+2. Laura has profiled her code, and determined that it can run efficiently on
+   20 cores with 12 GB of memory each. The complete process should be done
+   within 4 days.
 3. Ben wants to visualize a 2 GB file in QGIS.
-4. Neha has written and run some Python code on her own machine. She now wants to move to Puhti and, before running her full pipeline, test that her code executes correctly with a minimal dataset.
-5. Josh wants to run 4 memory heavy tasks (100GB) in parallel. Each job takes about 30 minutes to execute.
+4. Neha has written and run some Python code on her own machine. She now wants
+   to move to Puhti and, before running her full pipeline, test that her code
+   executes correctly with a minimal dataset.
+5. Josh wants to run 4 memory heavy tasks (100GB) in parallel. Each job takes
+   about 30 minutes to execute.
 
 :::{admonition} Solution
 :class: dropdown
 
 1. She does not need interactive access to her process, so `small` suits best.
-2. She needs to choose `longrun` or adapt her code to get under 3 days runtime (which she might want to do in order to avoid exessively long queueing times).
-3. For the webinterface,`interactive` suits best and should be the first choice. 
-4. This is a very good idea and should always be done first. Neha can get the best and fast experience using `test` partition. This means to keep the runtime under 15 min and the memory needs below 190 GiB at a maximum of 80 tasks.
-5. 400GB memory in total is more than most partitions can take. If this is the least memory possible for the jobs, it has to be run on `hugemem`.
+2. She needs to choose `longrun` or adapt her code to get under 3 days runtime
+   (which she might want to do in order to avoid excessively long queueing
+   times).
+3. For the web interface, `interactive` suits best and should be the first
+   choice. 
+4. This is a very good idea and should always be done first. Neha can get the
+   testing done quickly (= with limited queuing overhead) using the `test`
+   partition. This means to keep the runtime under 15 min and the memory needs
+   below 190 GiB at a maximum of 80 tasks.
+5. 400GB memory in total is more than most partitions can provide. If this is the
+   least memory possible for the jobs, it has to be run on `hugemem`.
 :::
 :::
 

diff --git a/materials/cheatsheet.md b/materials/cheatsheet.md
@@ -1,7 +1,9 @@
 # CSC and Unix cheatsheet
 Adapted from [CSC Quick Reference](https://docs.csc.fi/img/csc-quick-reference/csc-quick-reference.pdf)
 
-Note that this is simplified for beginners usage, once you get more experienced, you'll notice that there is more (and better) options for everything, and that not everything written here is "the whole truth".
+Note that this is simplified for beginners' usage. Once you get more
+experienced, you'll notice that there are more (and better) options for
+everything, and that not everything written here is "the whole truth".
 
 ## Service names
 

diff --git a/materials/connecting.md b/materials/connecting.md
@@ -26,7 +26,10 @@
 
 ## Connecting to the supercomputer via SSH
 
-During the course we will access the supercomputer via the web interface in order to not overwhelm you with setups before the course. However, this way may not always be the most convenient. You can also connect to the supercomputer via SSH.
+During the course we will access the supercomputer via the web interface in
+order to not overwhelm you with setups before the course. However, this way
+may not always be the most convenient. You can also connect to the
+supercomputer via SSH.
 
 :::{admonition} Connecting with SSH clients
 :class: seealso, dropdown
@@ -46,7 +49,11 @@ During the course we will access the supercomputer via the web interface in orde
 
 ## Developing scripts remotely
 
-Instead of developing code on your local machine and moving it as files to the supercomputer for testing, you can also consider to use a local editor and push edited files directly into the supercomputer. 
-This works for example with **Visual Studio Code** or **Notepad++**. Note that [Visual Studio Code](https://docs.csc.fi/computing/webinterface/vscode/) is also available through the Puhti web interface.
+Instead of developing code on your local machine and moving it as files to the
+supercomputer for testing, you can also consider to use a local editor and
+push edited files directly to the supercomputer.  This works for example
+with **Visual Studio Code** or **Notepad++**. Note that [Visual Studio
+Code](https://docs.csc.fi/computing/webinterface/vscode/) is also available
+through the Puhti web interface.
 
 - [CSC Docs: Developing scripts remotely](https://docs.csc.fi/support/tutorials/remote-dev/)