In this example project you find a way to conduct experiments on Snellius of the following type: We have different (problem) instances on which we want to try different methods. In particular, for each instance we want to run each method and store the results. All experiment runs are independent of each other, so we can run them in parallel.
The idea is that this code can be used as a simple template for conducting experiments on Snellius. It stores the used .sh script and experiment settings in the results folder for reproducibility.
The project structure is as follows:
src\\
: Folder with project source code to be run on Snellius.experiments_settings.json
: Contains the settings of the experiments.job_script_Snellius.sh
: Contains the job script to run the experiments on Snellius.run_experiment.py
: Contains the code to run each experiment in parallel.README.md
: Contains the documentation of the project.Snellius setup\\
: Folder with optional scripts to constructjob_script_Snellius.sh
andexperiments_settings.json
and place them in the root.
The steps to run the experiments are as follows:
- Create an
experiments_settings.json
file with the experiments settings in the project's root folder. To that end, one can useSnellius setup\experiments_settings_constructor.py
which will createexperiments_settings.json
in the project's root folder. - Create a
job_script_Snellius.sh
job script file in the project's root folder. To that end, one can useSnellius setup\job_script_constructor. py
that requires that there exists anexperiments_settings.json
file in the root (so perform first step 1). - Place your code in the
src
folder and ensurerun_experiment.py
has correct access to it. - It is a good idea to test the code locally regarding
run_experiment.py
(just give some dummy arguments for testing). For example, you could runrun_experiment.py "instance 1" "amazing method 2" results\
to see whether it correctly creates a.json
file inresults//instance 1//amazing method 2//results.json
. - Copy the project to Snellius and navigate to the project's root folder.
Ensure that poetry is installed on Snellius and run
poetry install
in case the project does not have an own environment yet. - Run the job script
job_script_Snellius.sh
on Snellius using commandsbatch job_script_Snellius.sh
(ensure that you are in the root folder of the project). - The results will be stored in a folder (by default named
results
with a timestamp in the root folder).
Here's a detailed explanation of the provided shell script:
#!/bin/bash
# Set job requirements
#SBATCH --job-name=Snellius_example_project
#SBATCH --partition=rome
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --time=00:10:40
#SBATCH --mail-type=BEGIN,END
#SBATCH --mail-user=joost.berkhout@vu.nl
#SBATCH --output="slurm-%j.out"
# Create some variables
base_dir="$HOME/Snellius example project"
results_folder="$base_dir/$(date +"results %d-%m-%Y %H-%M-%S")"
experiments_settings="$base_dir/experiments_settings.json"
# Move to working directory and create results folder
cd "$base_dir"
mkdir -p "$results_folder"
instances=$(jq -r '.instances[]' "$experiments_settings")
methods=$(jq -r '."methods"[]' "$experiments_settings")
while read -r instance; do
while read -r method; do
srun --ntasks=1 --nodes=1 --cpus-per-task=1 poetry run python "$base_dir/run_experiment.py" "$instance" "$method" "$results_folder" &
done <<< "$methods"
done <<< "$instances"
wait
```bash
#!/bin/bash
- This line specifies the script should be run using the Bash shell.
# Set job requirements
#SBATCH --job-name=Snellius_example_project
- Specifies the name of the job as
Snellius_example_project
.
#SBATCH --partition=rome
- Specifies the partition (queue) to submit the job to, in this case,
rome
.
#SBATCH --nodes=1
- Requests 1 compute node for the job.
#SBATCH --ntasks=128
- Requests 128 tasks for the job, usually corresponding to the number of CPU cores.
#SBATCH --time=00:10:40
- Sets a time limit of 10 minutes and 40 seconds for the job.
#SBATCH --mail-type=BEGIN,END
- Requests email notifications when the job begins and ends.
#SBATCH --mail-user=joost.berkhout@vu.nl
- Specifies the email address to send notifications to.
#SBATCH --output="slurm-%j.out"
- Sets the name of the output file for the job's standard output, where
%j
is replaced by the job ID.
# Create some variables
base_dir="$HOME/Snellius example project"
results_folder="$base_dir/$(date +"results %d-%m-%Y %H-%M-%S")"
experiments_settings="$base_dir/experiments_settings.json"
- Defines variables:
base_dir
for the project's base directory,results_folder
for the results directory with a timestamp, andexperiments_settings
for the path to the JSON settings file.
# Move to working directory and create results folder
cd "$base_dir"
mkdir -p "$results_folder"
- Changes the current directory to
base_dir
and creates theresults_folder
if it doesn't exist.
instances=$(jq -r '.instances[]' "$experiments_settings")
methods=$(jq -r '."methods"[]' "$experiments_settings")
- Uses
jq
to parseexperiments_settings.json
and extract lists of instances and methods.
while read -r instance; do
while read -r method; do
srun --ntasks=1 --nodes=1 --cpus-per-task=1 poetry run python "$base_dir/run_experiment.py" "$instance" "$method" "$results_folder" &
done <<< "$methods"
done <<< "$instances"
wait
- Iterates over each instance and method, running
run_experiment.py
withsrun
in parallel for each combination. Thewait
command ensures the script waits for all background tasks to complete before finishing.
- The script sets up the job requirements and defines necessary variables.
- It uses
jq
to extract theinstances
andmethods
arrays from the JSON file. - The script starts the outer
while
loop, iterating over eachinstance
. - For each
instance
, it starts the innerwhile
loop, iterating over eachmethod
. - For each pair of
instance
andmethod
, it runs the experiment. - The nested loops ensure that every combination of
instance
andmethod
is processed. - The script prints "End test" after processing all combinations.