This repository contains the material for the Track 3 Hands-on Sessions.
It mainly consists of some Jupyter notebooks, plus some support scripts. The notebooks should be followed in this order:
Hands-On #1: hands_on_1.ipynb
Hands-On #2: hands_on_2.ipynb
Hands-On #3: hands_on_3.ipynb
Hands-On #4: hands_on_4.ipynb
Hands-On #5: hands_on_5.ipynb
Although all packages are already installed for your accounts, it is advisable to set the cache's location to the local scratch disk to avoid using up quota:
mkdir -p /scratch/$USER/pip_cache
Temporarily set the environment variable to tell pip to use a different cache location:
export PIP_CACHE_DIR=/scratch/$USER/pip_cache/
Unless instructed otherwise, you will work from a remote working directory:
cd /usr/itetnas04/data-scratch-01/$USER/data/
Clone the repository in the working directory.
The dataset is already downloaded under: /usr/itetnas04/data-scratch-01/efcl_007fs24/data/dataset
To activate the pre-installed conda environment, run:
source /usr/itetnas04/data-scratch-01/$USER/data/conda/bin/activate
If there is no conda environment available, you can create your own environment:
./install_conda.sh
# log out and log in again
source /usr/itetnas04/data-scratch-01/$USER/data/conda/bin/activate
pip install --index-url https://download.pytorch.org/whl/cu118 -r requirements_torch.txt
pip install -r requirements.txt
For these sessions, you are suggested to use Jupyter Lab (*). Start it with the following command (from the cloned folder):
python3 -m jupyterlab --no-browser --port 5901 --ip $(hostname -f)
or
jupyter lab --no-browser --port 5901 --ip $(hostname -f)
The port range should be [5900-5999].
(*) Note: you can also open the notebooks with the classic jupyter notebook
command if you want, but jupyter lab
makes it easier to navigate the sections of each notebook.
You will run Hands-on #1 on a GPU cluster, rather than using your local machine, in order to accelerate the training processes. An interactive session lasting for 300 minutes (NAS can be demanding) on a GPU node can be started with:
export SLURM_CONF=/home/sladmsnow/slurm/slurm.conf
srun --time 300 --gres=gpu:1 --pty bash -i
You can now reactivate your conda environment and start your jupyter lab session on the GPU from the cloned folder.
IMPORTANT: for sake of time, each group will generate a single Optimized DNN (rather than a full Pareto front) during the hands-on session. The instructors will tell you what value to try. Once done, you will have to upload the optimized and finetuned model to this link, specifying your group name (pick the name that you want, just no vulgarity please 😃).
We will then pick one lucky winner DNN, which all groups will use for the personalized fine-tuning in Hands-on #3. Once selected, we will put the winner in this folder for you to download.
You will run Hands-on #2 locally, using the BioGAP for data acquisition and a terminal to run the associated scripts. The resources required for this session are already available for you in /scratch/$USER
of your tardis
machines.
cd /scratch/$USER/
Make sure to activate the conda environment in the terminal, as described above. The remaining instructions are available in the associated notebook from the cloned folder.
Connect to the GPU as you did in Hands-on #1, activate the conda environment, and start your jupyter lab session.
The requirements for Hands-on #4 are more complex, as you are expected to cross-compile an application for a RISC-V architecture. The SDK you will use has been developed for Ubuntu 22.04, but you are currently using a Debian 10 machine. To make this possible, we will use an Apptainer container. The Apptainer is already located in /scratch/$USER
. Let's first copy the current repo to the same location:
cp -r /usr/itetnas04/data-scratch-01/$USER/data/efcl-school-t3 /scratch/$USER/
To activate the apptainer, you can run:
cd /scratch/$USER/
apptainer shell ubuntu
source /home/efcl_venv/bin/activate
You can now use the apptainer as you would use your host. For Hands-on #4, there are several steps you need to take in order to prepare the deployment. First, we copy on '/scratch' all directories concerning the deployment:
cp -r /home/gap_sdk_private/ /scratch/$USER/
cp -r /home/match/ /scratch/$USER/
cp -r /home/match_gap9/ /scratch/$USER/
cd /scratch/$USER/efcl-school-t3
You can now start the jupyer lab session in the apptainer. Make sure you are using the correct kernel by running:
python3 -m ipykernel install --user --name=efcl_venv
and selecting efcl_venv
when starting your notebook. To do so, open the "Kernel" drop-down menu on the top right of the window and under "Change kernel", select the previously created efcl_venv
kernel. You can also connect the GAP9 evaluation kit, containing the GAPmod, to your computer - use the switch to turn it on and make sure the LEDs turn on.
You will run Hands-on #5 locally, using the BioGAP for data acquisition, a terminal to run the acquisition script, and another terminal to compute and display the predicted gestures. The resources required for this session are already available for you in /scratch/$USER
of your tardis
machines, copied by you in Hands-on #4.
cd /scratch/$USER/efcl-school-t3
Make sure to activate the conda environment in the terminals, as described above. The remaining instructions are available in the associated notebook from the cloned folder.