Merge pull request #30 from fabian-sp/f-config-manager

Adding a config manager for running multiple jobs
fabian-sp · Jan 10, 2024 · 04e13ac · 04e13ac
2 parents 1880588 + c64f8c9
commit 04e13ac
Show file tree

Hide file tree

Showing 6 changed files with 261 additions and 80 deletions.
diff --git a/README.md b/README.md
@@ -39,12 +39,22 @@ For each experiment, the exact config can also be found under `configs/` where t
 
 ## How to use
 
-Any experiment needs a config file, see e.g. `configs/test.json`.
+The main runner functions are `run.py` (or `run.ipynb` if you prefer notebooks). Any experiment needs a config file, see e.g. `configs/test.json`.
+In general, the name of the config file serves as experiment ID; it is used later for storing the output, plotting etc. 
 
-* In the config you can specify at each key a list or a single entry. For every list entry, a cartesian product will be run.
-* The same is true for the hypeprparameters of each entry in the `opt` key of the config file.
-* Multiple runs can be done using the key `n_runs`. In each run the seed for shuffling the `DataLoader` changes.
-* The name of the config file serves as experiment ID, used later for running and storing the output. 
+There are two ways of specifying a config for `run.py`. 
+
+1) *dict-type* configs
+
+* Here, the config JSON is a dictionary where you can specify at each key a list or a single entry. 
+* The same is true for the hyperparameters of each entry in the `opt` key of the config file.
+* A cartesian product of all lists entrys will be run (ie. potentially many single training runs in sequence).
+* Multiple repetitions can be done using the key `n_runs`. This will use different seeds for shuffling the `DataLoader`.
+
+2) *list-type* configs
+
+* The config JSON is a list, where each entry is a config for a **single training run**. 
+* This format is intended only when you want to launch multiple runs in parallel. You should first create a dict-type config, and then use utilities for creating temporary list-type configs (see an example [here](configs/README.md#example-for-splitting-up-a-config)).
 
 You can run an experiment with `run.py` or with `run.ipynb`. A minimal example is:
 
@@ -73,4 +83,6 @@ For the entries in `history`, the following keys are important:
 * `train_loss`: loss function value over training set
 * `val_loss`: loss function value over validation set
 * `train_score`: score function (eg accuracy) over training set
-* `val_score`: score function (eg accuracy) over validation set
+* `val_score`: score function (eg accuracy) over validation set
+
+In [`stepback.utils.py`](stepback/utils.py) you can find several helper functions for merging or filtering output files.
diff --git a/configs/README.md b/configs/README.md
@@ -0,0 +1,31 @@
+## Remarks on config management
+
+1) The simple option: Create a dict-type config (e.g. like [test.json](test.json)). The file name (in this example we use ``my_exp.json``) will serve as an identifier ``exp_id`` in the next steps. You can then run all entries of the config with one job.
+
+2) The more complicated, but versatile option (e.g. when one single run is expensive): You can split a dict-type config into subsets (which are then stored as temporary list-type configs). This allows to only create configs which have not been run before. 
+
+*Case a)* Assume we want to rerun everything. Choose a `job_name` which will serve as folder name for temporary config files. Specify `splits` as the number of splits you wish (if not specified, it splits into lists of length one).
+
+```python
+from stepback.utils import split_config
+split_config(exp_id='my_exp', job_name=job_name, config_dir='configs/', splits=None, only_new=False)
+```
+
+
+*Case b)* Assume you have already ran some settings and only want to run new settings. The function will determine whether a specific setting has been run, by looking into the output from a ``output_dir`` which belong to ``exp_id``. **This is an experimental feature and should be handled with caution!**. You can run
+
+```python
+from stepback.utils import split_config
+split_config(exp_id='my_exp', job_name=job_name, config_dir='configs/', splits=None, only_new=True, output_dir='output/')
+```
+
+
+In both cases, this will create temporary list-type config files, stored in `configs/job_name/`, which can then be launched separately.
+The temproary config files will follow the name pattern
+
+```
+my_exp-001.json
+my_exp-002.json
+my_exp-003.json
+...
+```
diff --git a/output/README.md b/output/README.md
@@ -2,16 +2,12 @@
 
 We store the results of all experiments here.
 
-In the [plotting script](../show.py), for a given experiment ID `EXP_ID`, all output files in this folder are collected if their name is either
+**Important:** The [``Record``](../stepback/record.py) object - which serves for plotting, analyzing results etc - will collect output from multiple files for a given experiment ID `EXP_ID`. Specifically, it loads the output from all files in this folder if the file name is in
 
 ```
 <EXP_ID>.json
 <EXP_ID>-1.json, <EXP_ID>-2.json, ...
 ```
 
-This has the following reason: it might be useful to split up config files even though they belong together. If we want to run parts of the same config in parallel, it should be safer to write to different output files. Hence, if desired, you can split your config into the same structure:
-
-```
-<EXP_ID>.json
-<EXP_ID>-1.json, <EXP_ID>-2.json, ...
-```
+We do this because it might be useful to split up output of different runs which actually *belong together* into different files.
+You can however also easily merge multiple output files (or all files in a subdirectory) with the utilities in [`stepback.utils.py`](../stepback/utils.py).
diff --git a/run.py b/run.py
@@ -6,9 +6,9 @@
 import argparse
 import torch
 
-from stepback.utils import prepare_config, create_exp_list
 from stepback.base import Base
 from stepback.log import Container
+from stepback.config import ConfigManager
 
 from stepback.defaults import DEFAULTS
 
@@ -64,12 +64,8 @@ def run_one(exp_id: str,
     """
 
     # load config
-    with open(config_dir + f'{exp_id}.json') as f:
-        exp_config = json.load(f)
-
-    # prepare list of configs (cartesian product)
-    exp_config = prepare_config(exp_config)
-    exp_list = create_exp_list(exp_config)
+    Conf = ConfigManager(exp_id=exp_id, config_dir=config_dir)
+    exp_list = Conf.create_config_list()
 
     print(f"Created {len(exp_list)} different configurations.")
 

diff --git a/stepback/config.py b/stepback/config.py
@@ -0,0 +1,133 @@
+import copy
+import json
+import os
+import itertools
+
+from .defaults import DEFAULTS
+
+class ConfigManager:
+    """
+    For managing config files.
+
+    We distinguish two types of config files:
+        * dict-type, where each value can be a list. This will be converted into a cross-product of single run configs.
+            You should always set up this type of config, and then create list-type configs only for temporary use.
+
+        * list-type. This is essentially a subset of the cross-product that comes from a dict-type config. Intenden mainly for running many jobs in parallel.
+
+    """
+    def __init__(self, 
+                 exp_id: str, 
+                 config_dir: str=DEFAULTS.config_dir
+                 ):
+
+        self.exp_id = exp_id
+        self.config_dir = config_dir
+
+
+    def create_config_list(self):
+        """
+        Creates a list of all configs for single runs.
+
+        Operation depends on which type of config the JSON with name ``self.exp_id`` is:
+            
+            * If dict-type, then the cross-product is created here and returned.
+            * If list-type, then the list is returned.
+
+        """
+
+        with open(os.path.join(self.config_dir, self.exp_id) + '.json') as f:
+            exp_config = json.load(f)
+
+        # Check whether it is dict-typ or not, and do some sanity checks
+        if isinstance(exp_config, dict):
+            self.dict_type = True
+            assert 'n_runs' in exp_config.keys(), 'Dict-type config must specify the number of runs (e.g. "n_runs": 1).'
+        elif isinstance(exp_config, list):
+            self.dict_type = False
+            for c in exp_config:
+                assert 'run_id' in c.keys(), 'List-type config must contain "run_id" for every list element.'
+        else:
+            raise KeyError("Config has unknown format, must be dict or list.")
+
+        if self.dict_type:
+            exp_config = prepare_config(exp_config)
+            exp_list = create_exp_list(exp_config)          # cartesian product
+        else:
+            exp_list = copy.deepcopy(exp_config)
+
+        self.exp_list = exp_list
+
+        return self.exp_list
+
+
+"""
+Utility functions for Experiments.
+"""
+
+def prepare_config(exp_config: dict) -> dict:
+    """
+    Given an experiment config, we do the following preparations:
+        
+        * Convert n_runs to a list of run_id (integer values)
+        * Convert each element of opt to a list of opt configs.
+    """
+    c = copy.deepcopy(exp_config)
+
+    c['run_id'] = list(range(c['n_runs']))
+    del c['n_runs']
+
+
+    assert isinstance(c['opt'], list), f"The value of 'opt' needs to be a list, but is given as {c['opt']}."
+
+    all_opt = list()
+    for this_opt in c['opt']:
+
+        # make every value a list
+        for k in this_opt.keys():
+            if not isinstance(this_opt[k], list):
+                this_opt[k] = [this_opt[k]]
+
+        # cartesian product
+        all_opt += [dict(zip(this_opt.keys(), v)) for v in itertools.product(*this_opt.values())]
+
+    c['opt'] = all_opt
+
+    return c
+
+def create_exp_list(exp_config: dict):
+    """
+    This function was adapted from: https://github.com/haven-ai/haven-ai/blob/master/haven/haven_utils/exp_utils.py
+    
+    Creates a cartesian product of a experiment config.
+    
+    Each value of exp_config should be a single entry or a list.
+    For list values, every entry of the list defines a single realization.
+    
+    Parameters
+    ----------
+    exp_config : dict
+ 
+    Returns
+    -------
+    exp_list: list
+        A list of configs, each defining a single run.
+    """
+    exp_config_copy = copy.deepcopy(exp_config)
+
+    # Make sure each value is a list
+    for k, v in exp_config_copy.items():
+        if not isinstance(exp_config_copy[k], list):
+            exp_config_copy[k] = [v]
+
+    # Create the cartesian product
+    exp_list_raw = (
+        dict(zip(exp_config_copy.keys(), v)) for v in itertools.product(*exp_config_copy.values())
+    )
+
+    # Convert into a list
+    exp_list = []
+    for exp_dict in exp_list_raw:
+        exp_list += [exp_dict]
+
+    return exp_list