Running

There are 3 ways to run this pipeline, depending on how many jobs you want to run.

Single run

A single run (interactively) to test a single configuration. All configuration lives in config.py. To run, choose as task from the Task enum in config.py, for instance NUCLR. (At the moment of writing this NUCLR is the only option available). Then you can run a default run with python scripts/train.py.

The configurable parameters are defined in the tasks dict (again in config.py). Each Element in the dict has a key, which is the name of the configurable (for instance LR) and a list of values. The first value is the default. The rest of the values are not used for a single run, see the Snakemake section. You can change any of those default parameters by providing --key=new_value to your run command, e.g. python scripts/train.py --LR=1e-3. By default, model checkpoints will be saved in ./results/<a_long_name_with_the_config>/model...pt. You can change the output directory by providing --ROOT path or by setting an environment variable, see below.

You can activate WANDB with --WANDB.

SnakeMake local

First, install snakemake (pip install snakemake). Snakemake is a job scheduler it is set up here to run all the different combinations of all the configuration parameter lists. So if your NUCLR looked like

    NUCLR = {
        "LR": [1e-3, 1e-4],
        "BATCH_SIZE": [32, 64],
        "EPOCHS": [10, 20],
    }

then snakemake will run $2^3=8$ jobs, each with a different combination of the parameters. lib/run_config.py has a couple of configurable options for snakemake, like SM_GPU and SM_ROOT. Change those as you wish. Then you can run snakemake -c4 to run 4 jobs in parallel and get through all the tasks. The results will be saved in SM_ROOT/<a_long_name_with_the_config>/model...pt.

Note: You may need to set the environment variable MKL_SERVICE_FORCE_INTEL to 1 to avoid crashing. We're working fixing this bug. For now you can run your snakemake commands with the variable set as follows MKL_SERVICE_FORCE_INTEL=1 snakemake -c4

Slurm

Snakemake can run with slurm, you need to set only a few things: If you want to run on GPUs, set SM_GPU in config.py. Adjust the slurm extra parameters in Snakefile to match your specifications, mostly the partition. Make a slurm config file for snakemake, example: ~/.config/snakemake/slurm_gpu/config.yaml:

cluster: slurm
jobs: 128
retries: 3
default-resources:
  - slurm_account=nnolte
  - cluster_jobname="%r_%w_%T"
  - mem_mb=None
  - mem_mib=None
  - disk_mb=None
  - disk_mib=None

Then run with MKL_SERVICE_FORCE_INTEL=1 snakemake --slurm --profile slurm_gpu. The MKL_SERVICE_FORCE_INTEL is needed on some clusters, not sure why.

Changing default directories

Default directories can be updated using environment variables. For instance, to change the default data directory, you can set the environment variable NUCLR_DATA_DIR using export NUCLR_DATA_DIR=/path/to/data in your shell/rc file. Checkpoints and metadata are saved to NUCLR_ROOT_DIR, which defaults to ./results.

TODO

Use wandb artifacts to load models and data from a particular run.

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
data		data
lib		lib
nbs		nbs
nuclr		nuclr
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Running

Single run

SnakeMake local

Slurm

Changing default directories

TODO

About

Releases

Packages

Contributors 5

Languages

niklasnolte/ai-nuclear

Folders and files

Latest commit

History

Repository files navigation

Running

Single run

SnakeMake local

Slurm

Changing default directories

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages