This repository has been archived by the owner on Oct 7, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 182
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Extract environments to their own package.
This makes the distinction between an _environment_ and an _experiment_ clearer. If users want to import individual environments for their own debugging/development: ✗ from bsuite.experiments.catch import catch ✓ from bsuite.environments import catch This change also introduces some more formal typing of bsuite environments: - Add a base class which includes the bsuite_* attributes/methods. PiperOrigin-RevId: 307575828 Change-Id: Iba2303d64a397ccef8a3f3f154e414bf343f905b
- Loading branch information
1 parent
6c12227
commit f9b74bf
Showing
56 changed files
with
1,251 additions
and
948 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Environments | ||
|
||
This folder contains the raw *environments* used in `bsuite` experiments; we | ||
expose them here for debugging and development purposes; | ||
|
||
Recall that in the context of bsuite, an *experiment* consists of three parts: | ||
1. Environments: a fixed set of environments determined by some parameters. 2. | ||
Interaction: a fixed regime of agent/environment interaction (e.g. 100 | ||
episodes). 3. Analysis: a fixed procedure that maps agent behaviour to results | ||
and plots. | ||
|
||
Note: If you load the environment from this folder you will miss out on the | ||
interaction+analysis as specified by bsuite. In general, you should use the | ||
`bsuite_id` to load the environment via `bsuite.load_from_id(bsuite_id)` rather | ||
than the raw environment. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# python3 | ||
# pylint: disable=g-bad-file-header | ||
# Copyright 2019 DeepMind Technologies Limited. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================ | ||
"""Simple diagnostic bandit environment. | ||
Observation is a single pixel of 0 - this is an independent arm bandit problem! | ||
Rewards are [0, 0.1, .. 1] assigned randomly to 11 arms and deterministic | ||
""" | ||
|
||
from bsuite.environments import base | ||
from bsuite.experiments.bandit import sweep | ||
|
||
import dm_env | ||
from dm_env import specs | ||
import numpy as np | ||
|
||
|
||
class SimpleBandit(base.Environment): | ||
"""SimpleBandit environment.""" | ||
|
||
def __init__(self, seed=None): | ||
"""Builds a simple bandit environment. | ||
Args: | ||
seed: Optional integer. Seed for numpy's random number generator (RNG). | ||
""" | ||
super(SimpleBandit, self).__init__() | ||
self._rng = np.random.RandomState(seed) | ||
|
||
self._n_actions = 11 | ||
action_mask = self._rng.choice( | ||
range(self._n_actions), size=self._n_actions, replace=False) | ||
self._rewards = np.linspace(0, 1, self._n_actions)[action_mask] | ||
|
||
self._total_regret = 0. | ||
self._optimal_return = 1. | ||
self.bsuite_num_episodes = sweep.NUM_EPISODES | ||
|
||
def _get_observation(self): | ||
return np.ones(shape=(1, 1), dtype=np.float32) | ||
|
||
def _reset(self) -> dm_env.TimeStep: | ||
observation = self._get_observation() | ||
return dm_env.restart(observation) | ||
|
||
def _step(self, action: int) -> dm_env.TimeStep: | ||
reward = self._rewards[action] | ||
self._total_regret += self._optimal_return - reward | ||
observation = self._get_observation() | ||
return dm_env.termination(reward=reward, observation=observation) | ||
|
||
def observation_spec(self): | ||
return specs.Array(shape=(1, 1), dtype=np.float32) | ||
|
||
def action_spec(self): | ||
return specs.DiscreteArray(self._n_actions, name='action') | ||
|
||
def bsuite_info(self): | ||
return dict(total_regret=self._total_regret) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.