-
Request account
- You can request access to H-BRS cluster by sending email to the responsible persons from MAS-Group.
-
Environment setup
-
ssh to the cluster
ssh username2s@wr0.wr.inf.h-brs.de
-
Create conda environment
- Enter your conda
source ~/anaconda3/bin/activate
- Create your environment
check environment.yml and change the dependencies if necessary
conda env create -f environment.yml
- Enter your conda
-
-
Create a job script on batch system
#!/bin/bash #SBATCH --job-name=my_batch_job #SBATCH --partition=gpu #SBATCH --nodes=1 # number of nodes #SBATCH --mem=64GB # memory per node in MB (different units with$ #SBATCH --ntasks-per-node=16 # number of cores #SBATCH --time=24:00:00 # HH-MM-SS #SBATCH --output log/job_tf.%N.%j.out # filename for STDOUT (%N: nodename, %j: j$ #SBATCH --error log/job_tf.%N.%j.err # filename for STDERR # load cuda module load cuda # activate environment source ~/anaconda3/bin/activate ~/anaconda3/envs/env-pointcloud # locate to pointcloud_classification root directory cd ~/pointcloud_classification # run the script python trainer.py --model DGCNNC --train
More information about hardware and the current system status
-
Submit the job
sbatch job.sh
-
Status
- Check queue
squeue
- Cancel the job (scancel job_id e.g. scancel 207550 for the job id 207550)
scancel 207550
- More on slurm and batch systems
- Check queue
-
Running tensorboard on the cluster
You can run tensorboard on the cluster by submitting another job to the same node, and then ssh tunnel to the node so that you can open tensorboard loacally in your pc:
- Edit your job file (replace the training script with the following):
tensorboard --port=9000 --logdir=~/path/to/your/log
- SSH tunneling to the cluster
replace
ssh -N -f -L 9000:your_node:9000 username2s@wr0.wr.inf.h-brs.de
your_node
with the node you are scheduled to e.g.wr15
, andusername2s
with your FB02 UID. - In your browser, open
localhost:9000
- Edit your job file (replace the training script with the following):