Merge pull request #80 from rdnfn/dev/general

v0.5.0
rdnfn · May 26, 2022 · c51e9a6 · c51e9a6
2 parents 008a004 + 7af3b97
commit c51e9a6
Show file tree

Hide file tree

Showing 49 changed files with 951 additions and 357 deletions.
diff --git a/.gitignore b/.gitignore
@@ -119,6 +119,5 @@ docs/generated/
 #beobench
 beobench_results*
 notebooks/archive
-*beo.yaml
-*beo.yml
+.beobench.yml
 perf_tests*
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,22 @@
+# This CITATION.cff file was generated with cffinit.
+# Visit https://bit.ly/cffinit to generate yours today!
+
+cff-version: 1.2.0
+title: >-
+  Beobench: A Toolkit for Unified Access to Building
+  Simulations for Reinforcement Learning
+message: >-
+  If you use this software, please cite it using the
+  metadata from this file.
+type: software
+version: 0.5.0
+url: https://github.com/rdnfn/beobench
+authors:
+  - given-names: Arduin
+    family-names: Findeis
+  - given-names: Fiodar
+    family-names: Kazhamiaka
+  - given-names: Scott
+    family-names: Jeen
+  - given-names: Srinivasan
+    family-names: Keshav
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -42,6 +42,13 @@ beobench could always use more documentation, whether as part of the
 official beobench docs, in docstrings, or even on the web in blog posts,
 articles, and such.
 
+To update the API docs, use the following command inside the ``/docs`` directory:
+
+.. code-block::
+
+    sphinx-apidoc -f -o . ..
+
+
 Submit Feedback
 ~~~~~~~~~~~~~~~
 

diff --git a/HISTORY.rst b/HISTORY.rst
@@ -2,6 +2,29 @@
 History
 =======
 
+0.5.0 (2022-05-26)
+------------------
+
+* Features:
+
+  * Mean and cummulative metrics can now be logged by WandbLogger wrapper.
+  * Support for automatically running multiple samples/trials of same experiment via ``num_samples`` config parameter.
+  * Configs named `.beobench.yml` will be automatically parsed when Beobench is run in directory containing such a config. This allows users to set e.g. wandb API keys without referring to the config in every Beobench command call.
+  * Configs from experiments now specify the Beobench version used. When trying to rerun an experiment this version will be checked, and an error thrown if there is a mismatch between installed and requested version.
+  * Add improved high-level API for getting started. This uses the CLI arguments ``--method``, ``--gym`` and ``--env``. Example usage: ``beobench run --method ppo --gym sinergym --env Eplus-5Zone-hot-continuous-v1``.
+
+* Improvements
+
+  * Add ``CITATION.cff`` file to citing software easier.
+  * By default, docker builds of experiment images are now skipped if an image with tag corresponding to installed Beobench version already exists.
+  * Remove outdated guides and add yaml configuration description from docs.
+  * Add support for logging multidimensional actions to wandb.
+  * Add support for logging summary metrics on every env reset to wandb.
+
+* Fixes
+
+  * Updated BOPTEST integration to work with current version of Beobench.
+
 0.4.4 (2022-05-09)
 ------------------
 

diff --git a/PYPI_README.rst b/PYPI_README.rst
@@ -1,3 +1,5 @@
-A toolbox for benchmarking reinforcement learning (RL) algorithms on building energy optimisation (BEO) problems. Beobench tries to make working on RL for BEO easier: it provides simple access to existing libraries defining BEO problems (such as `BOPTEST <https://github.com/ibpsa/project1-boptest>`_) and provides a large set of pre-configured RL algorithms. Beobench is *not* a gym library itself - instead it leverages the brilliant work done by many existing gym-type projects and makes their work more easily accessible.
+A toolkit providing easy and unified access to building control environments for reinforcement learning (RL). Compared to other domains, `RL environments for building control <https://github.com/rdnfn/rl-building-control#environments>`_ tend to be more difficult to install and handle. Most environments require the user to either manually install a building simulator (e.g. `EnergyPlus <https://github.com/NREL/EnergyPlus>`_) or to manually manage Docker containers. This can be tedious.
+
+Beobench was created to make building control environments easier to use and experiments more reproducible. Beobench uses Docker to manage all environment dependencies in the background so that the user doesn't have to. A standardised API allows the user to easily configure experiments and evaluate new RL agents on building control environments.
 
 For more information go to the `documentation <https://beobench.readthedocs.io/>`_ and the `GitHub code repository <https://github.com/rdnfn/beobench>`_.
diff --git a/README.rst b/README.rst
@@ -24,7 +24,7 @@
         :target: https://opensource.org/licenses/MIT
         :alt: License
 
-A toolkit providing easy and unified access to building control environments for reinforcement learning (RL). Compared to other domains, `RL environments for building control <https://github.com/rdnfn/rl-building-control#environments>`_ tend to be more difficult to install and handle. Most environments require the user to either manually install a building simulator (e.g. `EnergyPlus <https://github.com/NREL/EnergyPlus>`_) or to manually manage Docker containers. This is tedious.
+A toolkit providing easy and unified access to building control environments for reinforcement learning (RL). Compared to other domains, `RL environments for building control <https://github.com/rdnfn/rl-building-control#environments>`_ tend to be more difficult to install and handle. Most environments require the user to either manually install a building simulator (e.g. `EnergyPlus <https://github.com/NREL/EnergyPlus>`_) or to manually manage Docker containers. This can be tedious.
 
 Beobench was created to make building control environments easier to use and experiments more reproducible. Beobench uses Docker to manage all environment dependencies in the background so that the user doesn't have to. A standardised API, illustrated in the figure below, allows the user to easily configure experiments and evaluate new RL agents on building control environments.
 
@@ -66,7 +66,7 @@ Installation
 ------------
 
 1. `Install docker <https://docs.docker.com/get-docker/>`_ on your machine (if on Linux, check the `additional installation steps <https://beobench.readthedocs.io/en/latest/guides/installation_linux.html>`_)
-2. Install *beobench* using:
+2. Install Beobench using:
 
         .. code-block:: console
 
@@ -94,9 +94,23 @@ Experiment configuration
 
 To get started with our first experiment, we set up an *experiment configuration*.
 Experiment configurations
-can be given as a yaml file or a Python dictionary. Such a configuration
+can be given as a yaml file or a Python dictionary. The configuration
 fully defines an experiment, configuring everything
-from the RL agent to the environment and its wrappers.
+from the RL agent to the environment and its wrappers. The figure below illustrates the config structure.
+
+.. raw:: html
+
+   <p align="center">
+
+.. image:: https://github.com/rdnfn/beobench/raw/2cf961a8135b25c9a66e70d67eea9890ce0b878a/docs/_static/beobench_config_v1.png
+        :align: center
+        :width: 350 px
+        :alt: Beobench
+
+.. raw:: html
+
+   </p>
+
 
 Let's look at a concrete example. Consider this ``config.yaml`` file:
 
@@ -174,7 +188,7 @@ Execution
 
 .. end-qs-sec3
 
-Given the configuration and agent script above, we can run the experiment using either via the command line:
+Given the configuration and agent script above, we can run the experiment either via the command line:
 
 .. code-block:: console
 

diff --git a/beobench/__init__.py b/beobench/__init__.py
@@ -2,7 +2,7 @@
 
 __author__ = """Beobench authors"""
 __email__ = "-"
-__version__ = "0.4.4"
+__version__ = "0.5.0"
 
 from beobench.utils import restart
 from beobench.experiment.scheduler import run
diff --git a/beobench/beobench_contrib b/beobench/beobench_contrib
diff --git a/beobench/cli.py b/beobench/cli.py
@@ -25,6 +25,11 @@ def cli():
     default=None,
     help="Name of RL method to use in experiment.",
 )
+@click.option(
+    "--gym",
+    default=None,
+    help="Name of gym framework to use in experiment.",
+)
 @click.option(
     "--env",
     default=None,
@@ -82,9 +87,15 @@ def cli():
     default=None,
     help="For developer use only: location of custom beobench package version.",
 )
+@click.option(
+    "--force-build",
+    is_flag=True,
+    help="whether to force a re-build, even if image already exists.",
+)
 def run(
     config: str,
     method: str,
+    gym: str,
     env: str,
     local_dir: str,
     wandb_project: str,
@@ -96,6 +107,7 @@ def run(
     no_additional_container: bool,
     use_no_cache: bool,
     dev_path: str,
+    force_build: bool,
 ) -> None:
     """Run beobench experiment from command line.
 
@@ -110,6 +122,7 @@ def run(
     beobench.experiment.scheduler.run(
         config=list(config),
         method=method,
+        gym=gym,
         env=env,
         local_dir=local_dir,
         wandb_project=wandb_project,
@@ -121,6 +134,7 @@ def run(
         no_additional_container=no_additional_container,
         use_no_cache=use_no_cache,
         dev_path=dev_path,
+        force_build=force_build,
     )
 
 

diff --git a/beobench/constants.py b/beobench/constants.py
@@ -2,6 +2,8 @@
 
 import pathlib
 
+USER_CONFIG_PATH = pathlib.Path("./.beobench.yml")
+
 # available gym-framework integrations
 AVAILABLE_INTEGRATIONS = [
     "boptest",

diff --git a/beobench/data/agents/random_action.py b/beobench/data/agents/random_action.py
@@ -28,10 +28,24 @@
 except KeyError:
     horizon = 1000
 
+try:
+    imitate_rllib_env_checks = config["agent"]["config"]["imitate_rllib_env_checks"]
+except KeyError:
+    imitate_rllib_env_checks = False
+
 
 print("Random agent: starting test.")
 
 env = create_env()
+
+if imitate_rllib_env_checks:
+    # RLlib appears to reset and take single action in env
+    # this may be to check compliance of env with space etc.
+    env.reset()
+    action = env.action_space.sample()
+    _, _, _, _ = env.step(action)
+
+
 observation = env.reset()
 
 num_steps_per_ep = 0

diff --git a/beobench/data/configs/rewex01.yaml → beobench/data/configs/archive/rewex01.yaml b/beobench/data/configs/rewex01.yaml → beobench/data/configs/archive/rewex01.yaml
diff --git a/beobench/data/configs/rewex01_grid.yaml → ...ch/data/configs/archive/rewex01_grid.yaml b/beobench/data/configs/rewex01_grid.yaml → ...ch/data/configs/archive/rewex01_grid.yaml
diff --git a/beobench/data/configs/rewex01_grid02.yaml → .../data/configs/archive/rewex01_grid02.yaml b/beobench/data/configs/rewex01_grid02.yaml → .../data/configs/archive/rewex01_grid02.yaml
diff --git a/beobench/data/configs/rewex02.yaml → beobench/data/configs/archive/rewex02.yaml b/beobench/data/configs/rewex02.yaml → beobench/data/configs/archive/rewex02.yaml
diff --git a/beobench/data/configs/baselines/boptest_arroyo2022_dqn.yaml b/beobench/data/configs/baselines/boptest_arroyo2022_dqn.yaml
@@ -0,0 +1,72 @@
+# A first attempt at reproduction of experiments in the following paper by Arroyo et al.
+# https://lirias.kuleuven.be/retrieve/658452
+#
+# Some of the descriptions of RLlib config values are taken from
+# https://docs.ray.io/en/latest/rllib/rllib-training.html
+# other from
+# https://github.com/ibpsa/project1-boptest-gym/blob/master/boptestGymEnv.py
+
+env:
+  gym: boptest
+  config:
+    name: bestest_hydronic_heat_pump
+    # whether to normalise the observations and actions
+    normalize: True
+    discretize: True
+    gym_kwargs:
+      actions: ["oveHeaPumY_u"]
+      # Dictionary mapping observation keys to a tuple with the lower
+      # and upper bound of each observation. Observation keys must
+      # belong either to the set of measurements or to the set of
+      # forecasting variables of the BOPTEST test case. Contrary to
+      # the actions, the expected minimum and maximum values of the
+      # measurement and forecasting variables are not provided from
+      # the BOPTEST framework, although they are still relevant here
+      # e.g. for normalization or discretization. Therefore, these
+      # bounds need to be provided by the user.
+      # If `time` is included as an observation, the time in seconds
+      # will be passed to the agent. This is the remainder time from
+      # the beginning of the episode and for periods of the length
+      # specified in the upper bound of the time feature.
+      observations:
+        reaTZon_y: [280.0, 310.0]
+      # Set to True if desired to use a random start time for each episode
+      random_start_time: True
+      # Maximum duration of each episode in seconds
+      max_episode_length: 31536000 # one year in seconds
+      # Desired simulation period to initialize each episode
+      warmup_period: 10
+      # Sampling time in seconds
+      step_period: 900 # = 15min
+agent:
+  origin: rllib
+  config:
+    run_or_experiment: DQN
+    config:
+      lr: 0.0001
+      gamma: 0.99
+      # Number of steps after which the episode is forced to terminate. Defaults
+      # to `env.spec.max_episode_steps` (if present) for Gym envs.
+      horizon: 24 # one week 672 = 96 * 7 # other previous values: 96 # 10000 #
+      # Calculate rewards but don't reset the environment when the horizon is
+      # hit. This allows value estimation and RNN state to span across logical
+      # episodes denoted by horizon. This only has an effect if horizon != inf.
+      soft_horizon: True
+      num_workers: 1 # this is required, otherwise effectively assuming simulator.
+      # Training batch size, if applicable. Should be >= rollout_fragment_length.
+      # Samples batches will be concatenated together to a batch of this size,
+      # which is then passed to SGD.
+      train_batch_size: 24
+    stop:
+      timesteps_total: 105120 # = 3 years # 35040 # = 365 * 96 (full year)
+wrappers:
+  - origin: general
+    class: WandbLogger
+    config:
+      log_freq: 1
+      summary_metric_keys:
+        - env.returns.reward
+general:
+  wandb_project: boptest_arroyo2022_baseline
+  wandb_group: random_action
+  num_samples: 1
diff --git a/beobench/data/configs/baselines/boptest_arroyo2022_random_agent.yaml b/beobench/data/configs/baselines/boptest_arroyo2022_random_agent.yaml
@@ -0,0 +1,59 @@
+# A first attempt at reproduction of experiments in the following paper by Arroyo et al.
+# https://lirias.kuleuven.be/retrieve/658452
+#
+# Some of the descriptions of RLlib config values are taken from
+# https://docs.ray.io/en/latest/rllib/rllib-training.html
+# other from
+# https://github.com/ibpsa/project1-boptest-gym/blob/master/boptestGymEnv.py
+
+env:
+  gym: boptest
+  name: bestest_hydronic_heat_pump
+  config:
+    boptest_testcase: bestest_hydronic_heat_pump
+    # whether to normalise the observations and actions
+    normalize: True
+    gym_kwargs:
+      actions: ["oveHeaPumY_u"]
+      # Dictionary mapping observation keys to a tuple with the lower
+      # and upper bound of each observation. Observation keys must
+      # belong either to the set of measurements or to the set of
+      # forecasting variables of the BOPTEST test case. Contrary to
+      # the actions, the expected minimum and maximum values of the
+      # measurement and forecasting variables are not provided from
+      # the BOPTEST framework, although they are still relevant here
+      # e.g. for normalization or discretization. Therefore, these
+      # bounds need to be provided by the user.
+      # If `time` is included as an observation, the time in seconds
+      # will be passed to the agent. This is the remainder time from
+      # the beginning of the episode and for periods of the length
+      # specified in the upper bound of the time feature.
+      observations:
+        reaTZon_y: [280.0, 310.0]
+      # Set to True if desired to use a random start time for each episode
+      random_start_time: True
+      # Maximum duration of each episode in seconds
+      max_episode_length: 31536000 # one year in seconds
+      # Desired simulation period to initialize each episode
+      warmup_period: 10
+      # Sampling time in seconds
+      step_period: 900 # = 15min
+agent:
+  origin: random_action
+  config:
+    config:
+      horizon: 96
+    stop:
+      timesteps_total: 10000
+    imitate_rllib_env_checks: True
+wrappers:
+  - origin: general
+    class: WandbLogger
+    config:
+      log_freq: 1
+      summary_metric_keys:
+        - env.returns.reward
+general:
+  wandb_project: boptest_arroyo2022_baseline
+  wandb_group: random_action
+  num_samples: 1
+3 −2		gyms/boptest/Dockerfile
+13 −18		gyms/boptest/env_creator.py
+2 −2		gyms/energym/env_creator.py