LUMI-AI-example

This project is still work in progress and changes are made constatnly. For well tested examples have a look at the LUMI AI workshop material: https://github.com/Lumi-supercomputer/Getting_Started_with_AI_workshop

Visual transformer model in PyTorch, serving as an example of how to run AI applications on LUMI.

We use the torchvision vit_b_16 model and train it with the tiny-imagenet dataset. This project is meant to provide a sandbox for testing and benchmarking AI applications on LUMI and should eventually serve as an A-Z example as part of the LUMI AI documentation. Use bigger models and larger dataset if required

HDF5 support

The imagenet dataset consists of hundreds of thousands of single jpg files. To avoid the "many small files" problem the datasets can be transformed into a single HDF5 file with the script turn_into_hdf5.py. Note, that this increases the size of the data by one order of magnitude as this script does not compress the data in any form.

Running script on LUMI

This github repo is cloned to /project/project_462000002/LUMI-AI-example. Training data, validation data, and the parameters of the model are in the same directory. The used container is extended via a virtual environment, as described here, since h5py is not included in the container. The training and validation datasets are also uploaded to the lumi-o:imagenet/ bucket. Anyone is welcome to work in that directory in order to minimize data storage, but please create a new branch.

Building website

Install the needed dependencies.

pip install -r requirements.txt

Edit with live preview

run

mkdocs serve

This command will start a live-reloading local web server that can be accessed in a web browser via: http://127.0.0.1:8000. The local web serve will automatically re-render and reload the site when you edit the documentation.

Generate the static site

To build a self-contained directory containing the full website run:

mkdocs build

The generated files will be located in the site/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
docs		docs
file-formats		file-formats
mkdocs_lumi		mkdocs_lumi
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ddp_visualtransformer.py		ddp_visualtransformer.py
docs-requirements.txt		docs-requirements.txt
ds_config.json		ds_config.json
ds_visualtransformer.py		ds_visualtransformer.py
hdf5_dataset.py		hdf5_dataset.py
mkdocs.yml		mkdocs.yml
mlflow_ddp_visualtransformer.py		mlflow_ddp_visualtransformer.py
run.sh		run.sh
run_ddp_srun.sh		run_ddp_srun.sh
run_ddp_srun_4.sh		run_ddp_srun_4.sh
run_ddp_torchrun.sh		run_ddp_torchrun.sh
run_ddp_torchrun_4.sh		run_ddp_torchrun_4.sh
run_ds_srun.sh		run_ds_srun.sh
run_ds_srun_4.sh		run_ds_srun_4.sh
run_ds_torchrun.sh		run_ds_torchrun.sh
run_ds_torchrun_4.sh		run_ds_torchrun_4.sh
run_ramfs.sh		run_ramfs.sh
set_up_environment.sh		set_up_environment.sh
tensorboard_ddp_visualtransformer.py		tensorboard_ddp_visualtransformer.py
visualtransformer.py		visualtransformer.py
visualtransformer_profiled.py		visualtransformer_profiled.py
visualtransformer_ramfs.py		visualtransformer_ramfs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LUMI-AI-example

HDF5 support

Running script on LUMI

Building website

Edit with live preview

Generate the static site

About

Releases

Packages

Contributors 6

Languages

License

Lumi-supercomputer/LUMI-AI-example

Folders and files

Latest commit

History

Repository files navigation

LUMI-AI-example

HDF5 support

Running script on LUMI

Building website

Edit with live preview

Generate the static site

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages