We require Mambaforge (a faster drop-in replacement of conda) or conda. Mambaforge is recommended. To install, simply run
make install
If you'd like to install without downloading data, run
make NO_DATA=1 install
You'll then need to install mujoco 210 to ~/.mujoco/mujoco210/ and add the following to your .bashrc
: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia:$HOME/.mujoco/mjpro150/bin
.
# example run
cfpi bc
# help
cfpi --help
Usage: cfpi [OPTIONS] ALGORITHM:{bc|mg|reverse_kl|sarsa_iqn|sg} VARIANT
╭─ Arguments ──────────────────────────────────────────────────────────────────────────╮
│ * algorithm ALGORITHM:{bc|mg|reverse_kl|sar Specify algorithm to run. Find │
│ sa_iqn|sg} all supported algorithms in │
│ ./cfpi/variants/SUPPORTED_ALGO… │
│ [default: None] │
│ [required] │
│ * variant TEXT Specify which variant of the │
│ algorithm to run. Find all │
│ supported variant in │
│ ./cfpi/variants/<algorithm_nam… │
│ [default: None] │
│ [required] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────╮
│ --parallel [AntMaze|MediumExpert| Run multiple versions of │
│ MediumReplay|Single|Wi the algorithm on │
│ de] different environments │
│ and seeds. │
│ [default: None] │
│ --gridsearch [CqlDeltas|Deltas|DetD Do a gridsearch. Only │
│ eltas|EasyBcq|Ensemble supported when parallel │
│ Size|FineGrainedDeltas is also enabled │
│ |Full|KFold|Mg|Pac|QTr [default: None] │
│ ainedEpochs|ReverseKl| │
│ Testing|TrqDelta] │
│ --dry --no-dry Just print the variant │
│ and pipeline. │
│ [default: no-dry] │
│ --install-completion Install completion for │
│ the current shell. │
│ --show-completion Show completion for the │
│ current shell, to copy │
│ it or customize the │
│ installation. │
│ --help Show this message and │
│ exit. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
Example command to run the Behavior Cloning VanillaVariant
over Medium Expert datasets, gridsearching over Delta hyperparameters:
cfpi bc --parallel MediumExpert --gridsearch Deltas VanillaVariant
Only variants that have a seed and env_id will run. Therefore, Base*
are typically not runnable. By default, if no variant is specified the vanilla variant will run.
- Install
pre-commit
hooks:
make pre-commit-install
- Run the codestyle:
make codestyle
If you get the error Failed to unlock the collection!
, try running
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
and rerun your command.
To enable or disable debug mode, copy and paste the cfpi/conf_private.example.py
to cfpi/conf_private.py
.
You can run the following helpful commands to breakpoint easily:
export PYTHONBREAKPOINT='IPython.core.debugger.set_trace
# OR
export PYTHONBREAKPOINT=ipdb.set_trace
Nodemon is another helpful tool. See example usage below:
nodemon -I -x 'cfpi bc --parallel MediumExpert --gridsearch Deltas VanillaBCVariant' -e py
If the code randomly gets stuck, the mujoco lockfile may need to be deleted. You can do so by running the delmlock at ./scripts/delmlock
. We recommend adding this script to your path.
- Support for
Python 3.8
. Poetry
as the dependencies manager. See configuration inpyproject.toml
andsetup.cfg
.- Automatic codestyle with
black
,isort
andpyupgrade
. - Ready-to-use
pre-commit
hooks with code-formatting. - Type checks with
mypy
; docstring checks withdarglint
; security checks withsafety
andbandit
- Testing with
pytest
. - Ready-to-use
.editorconfig
,.dockerignore
, and.gitignore
. You don't have to worry about those things.
To add a new algorithm, you need to do three things:
- Create the algorithm and the respective experiment file in
./cfpi/algorithms
- Specify this algorithm in
./variants/SUPPORTED_ALGORITHMS.py
- Create a
./variants/<alg_name>/variant.py
file
Makefile
contains a lot of functions for faster development.
1. Download and remove Poetry
To download and install Poetry run:
make poetry-download
To uninstall
make poetry-remove
2. Install all dependencies and pre-commit hooks
Install requirements:
make install
Pre-commit hooks coulb be installed after git init
via
make pre-commit-install
3. Codestyle
Automatic formatting uses pyupgrade
, isort
and black
.
make codestyle
# or use synonym
make formatting
Codestyle checks only, without rewriting files:
make check-codestyle
Note:
check-codestyle
usesisort
,black
anddarglint
library
Update all dev libraries to the latest version using one comand
make update-dev-deps
4. Code security
make check-safety
This command launches Poetry
integrity checks as well as identifies security issues with Safety
and Bandit
.
make check-safety
5. Type checks
Run mypy
static type checker
make mypy
6. Tests with coverage badges
Run pytest
make test
7. All linters
Of course there is a command to rule run all linters in one:
make lint
the same as:
make test && make check-codestyle && make mypy && make check-safety
8. Docker
make docker-build
which is equivalent to:
make docker-build VERSION=latest
Remove docker image with
make docker-remove
More information about docker.
9. Cleanup
Delete pycache files
make pycache-remove
Remove package build
make build-remove
Delete .DS_STORE files
make dsstore-remove
Remove .mypycache
make mypycache-remove
Or to remove all above run:
make cleanup
Want to know more about Poetry? Check its documentation.
Details about Poetry
Poetry's commands are very intuitive and easy to learn, like:
poetry add numpy@latest
poetry run pytest
poetry publish --build
etc
Building a new version of the application contains steps:
- Bump the version of your package
poetry version <version>
. You can pass the new version explicitly, or a rule such asmajor
,minor
, orpatch
. For more details, refer to the Semantic Versions standard. - Make a commit to
GitHub
. - Create a
GitHub release
. - And... publish 🙂
poetry publish --build
- Add support for deterministic CFPI
- Add support for VAE-CFPI
This project is licensed under the terms of the MIT
license. See LICENSE for more details.
@misc{li2022offline,
title={Offline Reinforcement Learning with Closed-Form Policy Improvement Operators},
author={Jiachen Li and Edwin Zhang and Ming Yin and Qinxun Bai and Yu-Xiang Wang and William Yang Wang},
journal={ICML},
year={2023},
This project would not be possible without the following wonderful prior work.
Optimistic Actor Critic gave inspiration to our method, D4RL provides the dataset and benchmark for evaluating the performance of our agent, and RLkit offered a strong RL framework for building our code from.
Template: python-package-template