Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a config manager for running multiple jobs #30

Merged
merged 8 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 18 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,22 @@ For each experiment, the exact config can also be found under `configs/` where t

## How to use

Any experiment needs a config file, see e.g. `configs/test.json`.
The main runner functions are `run.py` (or `run.ipynb` if you prefer notebooks). Any experiment needs a config file, see e.g. `configs/test.json`.
In general, the name of the config file serves as experiment ID; it is used later for storing the output, plotting etc.

* In the config you can specify at each key a list or a single entry. For every list entry, a cartesian product will be run.
* The same is true for the hypeprparameters of each entry in the `opt` key of the config file.
* Multiple runs can be done using the key `n_runs`. In each run the seed for shuffling the `DataLoader` changes.
* The name of the config file serves as experiment ID, used later for running and storing the output.
There are two ways of specifying a config for `run.py`.

1) *dict-type* configs

* Here, the config JSON is a dictionary where you can specify at each key a list or a single entry.
* The same is true for the hyperparameters of each entry in the `opt` key of the config file.
* A cartesian product of all lists entrys will be run (ie. potentially many single training runs in sequence).
* Multiple repetitions can be done using the key `n_runs`. This will use different seeds for shuffling the `DataLoader`.

2) *list-type* configs

* The config JSON is a list, where each entry is a config for a **single training run**.
* This format is intended only when you want to launch multiple runs in parallel. You should first create a dict-type config, and then use utilities for creating temporary list-type configs (see an example [here](configs/README.md#example-for-splitting-up-a-config)).

You can run an experiment with `run.py` or with `run.ipynb`. A minimal example is:

Expand Down Expand Up @@ -73,4 +83,6 @@ For the entries in `history`, the following keys are important:
* `train_loss`: loss function value over training set
* `val_loss`: loss function value over validation set
* `train_score`: score function (eg accuracy) over training set
* `val_score`: score function (eg accuracy) over validation set
* `val_score`: score function (eg accuracy) over validation set

In [`stepback.utils.py`](stepback/utils.py) you can find several helper functions for merging or filtering output files.
31 changes: 31 additions & 0 deletions configs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
## Remarks on config management

1) The simple option: Create a dict-type config (e.g. like [test.json](test.json)). The file name (in this example we use ``my_exp.json``) will serve as an identifier ``exp_id`` in the next steps. You can then run all entries of the config with one job.

2) The more complicated, but versatile option (e.g. when one single run is expensive): You can split a dict-type config into subsets (which are then stored as temporary list-type configs). This allows to only create configs which have not been run before.

*Case a)* Assume we want to rerun everything. Choose a `job_name` which will serve as folder name for temporary config files. Specify `splits` as the number of splits you wish (if not specified, it splits into lists of length one).

```python
from stepback.utils import split_config
split_config(exp_id='my_exp', job_name=job_name, config_dir='configs/', splits=None, only_new=False)
```


*Case b)* Assume you have already ran some settings and only want to run new settings. The function will determine whether a specific setting has been run, by looking into the output from a ``output_dir`` which belong to ``exp_id``. **This is an experimental feature and should be handled with caution!**. You can run

```python
from stepback.utils import split_config
split_config(exp_id='my_exp', job_name=job_name, config_dir='configs/', splits=None, only_new=True, output_dir='output/')
```


In both cases, this will create temporary list-type config files, stored in `configs/job_name/`, which can then be launched separately.
The temproary config files will follow the name pattern

```
my_exp-001.json
my_exp-002.json
my_exp-003.json
...
```
10 changes: 3 additions & 7 deletions output/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,12 @@

We store the results of all experiments here.

In the [plotting script](../show.py), for a given experiment ID `EXP_ID`, all output files in this folder are collected if their name is either
**Important:** The [``Record``](../stepback/record.py) object - which serves for plotting, analyzing results etc - will collect output from multiple files for a given experiment ID `EXP_ID`. Specifically, it loads the output from all files in this folder if the file name is in

```
<EXP_ID>.json
<EXP_ID>-1.json, <EXP_ID>-2.json, ...
```

This has the following reason: it might be useful to split up config files even though they belong together. If we want to run parts of the same config in parallel, it should be safer to write to different output files. Hence, if desired, you can split your config into the same structure:

```
<EXP_ID>.json
<EXP_ID>-1.json, <EXP_ID>-2.json, ...
```
We do this because it might be useful to split up output of different runs which actually *belong together* into different files.
You can however also easily merge multiple output files (or all files in a subdirectory) with the utilities in [`stepback.utils.py`](../stepback/utils.py).
10 changes: 3 additions & 7 deletions run.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
import argparse
import torch

from stepback.utils import prepare_config, create_exp_list
from stepback.base import Base
from stepback.log import Container
from stepback.config import ConfigManager

from stepback.defaults import DEFAULTS

Expand Down Expand Up @@ -64,12 +64,8 @@ def run_one(exp_id: str,
"""

# load config
with open(config_dir + f'{exp_id}.json') as f:
exp_config = json.load(f)

# prepare list of configs (cartesian product)
exp_config = prepare_config(exp_config)
exp_list = create_exp_list(exp_config)
Conf = ConfigManager(exp_id=exp_id, config_dir=config_dir)
exp_list = Conf.create_config_list()

print(f"Created {len(exp_list)} different configurations.")

Expand Down
133 changes: 133 additions & 0 deletions stepback/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
import copy
import json
import os
import itertools

from .defaults import DEFAULTS

class ConfigManager:
"""
For managing config files.

We distinguish two types of config files:
* dict-type, where each value can be a list. This will be converted into a cross-product of single run configs.
You should always set up this type of config, and then create list-type configs only for temporary use.

* list-type. This is essentially a subset of the cross-product that comes from a dict-type config. Intenden mainly for running many jobs in parallel.

"""
def __init__(self,
exp_id: str,
config_dir: str=DEFAULTS.config_dir
):

self.exp_id = exp_id
self.config_dir = config_dir


def create_config_list(self):
"""
Creates a list of all configs for single runs.

Operation depends on which type of config the JSON with name ``self.exp_id`` is:

* If dict-type, then the cross-product is created here and returned.
* If list-type, then the list is returned.

"""

with open(os.path.join(self.config_dir, self.exp_id) + '.json') as f:
exp_config = json.load(f)

# Check whether it is dict-typ or not, and do some sanity checks
if isinstance(exp_config, dict):
self.dict_type = True
assert 'n_runs' in exp_config.keys(), 'Dict-type config must specify the number of runs (e.g. "n_runs": 1).'
elif isinstance(exp_config, list):
self.dict_type = False
for c in exp_config:
assert 'run_id' in c.keys(), 'List-type config must contain "run_id" for every list element.'
else:
raise KeyError("Config has unknown format, must be dict or list.")

if self.dict_type:
exp_config = prepare_config(exp_config)
exp_list = create_exp_list(exp_config) # cartesian product
else:
exp_list = copy.deepcopy(exp_config)

self.exp_list = exp_list

return self.exp_list


"""
Utility functions for Experiments.
"""

def prepare_config(exp_config: dict) -> dict:
"""
Given an experiment config, we do the following preparations:

* Convert n_runs to a list of run_id (integer values)
* Convert each element of opt to a list of opt configs.
"""
c = copy.deepcopy(exp_config)

c['run_id'] = list(range(c['n_runs']))
del c['n_runs']


assert isinstance(c['opt'], list), f"The value of 'opt' needs to be a list, but is given as {c['opt']}."

all_opt = list()
for this_opt in c['opt']:

# make every value a list
for k in this_opt.keys():
if not isinstance(this_opt[k], list):
this_opt[k] = [this_opt[k]]

# cartesian product
all_opt += [dict(zip(this_opt.keys(), v)) for v in itertools.product(*this_opt.values())]

c['opt'] = all_opt

return c

def create_exp_list(exp_config: dict):
"""
This function was adapted from: https://github.com/haven-ai/haven-ai/blob/master/haven/haven_utils/exp_utils.py

Creates a cartesian product of a experiment config.

Each value of exp_config should be a single entry or a list.
For list values, every entry of the list defines a single realization.

Parameters
----------
exp_config : dict

Returns
-------
exp_list: list
A list of configs, each defining a single run.
"""
exp_config_copy = copy.deepcopy(exp_config)

# Make sure each value is a list
for k, v in exp_config_copy.items():
if not isinstance(exp_config_copy[k], list):
exp_config_copy[k] = [v]

# Create the cartesian product
exp_list_raw = (
dict(zip(exp_config_copy.keys(), v)) for v in itertools.product(*exp_config_copy.values())
)

# Convert into a list
exp_list = []
for exp_dict in exp_list_raw:
exp_list += [exp_dict]

return exp_list
Loading