cfpi

Offline Reinforcement Learning with Closed-Form Policy Improvement Operators

⚓ Installation

We require Mambaforge (a faster drop-in replacement of conda) or conda. Mambaforge is recommended. To install, simply run

make install

If you'd like to install without downloading data, run

make NO_DATA=1 install

You'll then need to install mujoco 210 to ~/.mujoco/mujoco210/ and add the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia:$HOME/.mujoco/mjpro150/bin.

Get Started

# example run
cfpi bc

# help
cfpi --help

 Usage: cfpi [OPTIONS] ALGORITHM:{bc|mg|reverse_kl|sarsa_iqn|sg} VARIANT

╭─ Arguments ──────────────────────────────────────────────────────────────────────────╮
│ *    algorithm      ALGORITHM:{bc|mg|reverse_kl|sar  Specify algorithm to run. Find  │
│                     sa_iqn|sg}                       all supported algorithms in     │
│                                                      ./cfpi/variants/SUPPORTED_ALGO… │
│                                                      [default: None]                 │
│                                                      [required]                      │
│ *    variant        TEXT                             Specify which variant of the    │
│                                                      algorithm to run. Find all      │
│                                                      supported variant in            │
│                                                      ./cfpi/variants/<algorithm_nam… │
│                                                      [default: None]                 │
│                                                      [required]                      │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────╮
│ --parallel                          [AntMaze|MediumExpert|  Run multiple versions of │
│                                     MediumReplay|Single|Wi  the algorithm on         │
│                                     de]                     different environments   │
│                                                             and seeds.               │
│                                                             [default: None]          │
│ --gridsearch                        [CqlDeltas|Deltas|DetD  Do a gridsearch. Only    │
│                                     eltas|EasyBcq|Ensemble  supported when parallel  │
│                                     Size|FineGrainedDeltas  is also enabled          │
│                                     |Full|KFold|Mg|Pac|QTr  [default: None]          │
│                                     ainedEpochs|ReverseKl|                           │
│                                     Testing|TrqDelta]                                │
│ --dry                   --no-dry                            Just print the variant   │
│                                                             and pipeline.            │
│                                                             [default: no-dry]        │
│ --install-completion                                        Install completion for   │
│                                                             the current shell.       │
│ --show-completion                                           Show completion for the  │
│                                                             current shell, to copy   │
│                                                             it or customize the      │
│                                                             installation.            │
│ --help                                                      Show this message and    │
│                                                             exit.                    │
╰──────────────────────────────────────────────────────────────────────────────────────╯

Example command to run the Behavior Cloning VanillaVariant over Medium Expert datasets, gridsearching over Delta hyperparameters: cfpi bc --parallel MediumExpert --gridsearch Deltas VanillaVariant

Note on variants

Only variants that have a seed and env_id will run. Therefore, Base* are typically not runnable. By default, if no variant is specified the vanilla variant will run.

🏗️ Development

Install pre-commit hooks:

make pre-commit-install

Run the codestyle:

make codestyle

Debugging

If you get the error Failed to unlock the collection!, try running

export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring

and rerun your command.

To enable or disable debug mode, copy and paste the cfpi/conf_private.example.py to cfpi/conf_private.py.

You can run the following helpful commands to breakpoint easily:

export PYTHONBREAKPOINT='IPython.core.debugger.set_trace
# OR
export PYTHONBREAKPOINT=ipdb.set_trace

Nodemon is another helpful tool. See example usage below:

nodemon -I -x 'cfpi bc --parallel MediumExpert  --gridsearch Deltas VanillaBCVariant' -e py

If the code randomly gets stuck, the mujoco lockfile may need to be deleted. You can do so by running the delmlock at ./scripts/delmlock. We recommend adding this script to your path.

Development features

Support for Python 3.8.
Poetry as the dependencies manager. See configuration in pyproject.toml and setup.cfg.
Automatic codestyle with black, isort and pyupgrade.
Ready-to-use pre-commit hooks with code-formatting.
Type checks with mypy; docstring checks with darglint; security checks with safety and bandit
Testing with pytest.
Ready-to-use .editorconfig, .dockerignore, and .gitignore. You don't have to worry about those things.

Adding a new algorithm

To add a new algorithm, you need to do three things:

Create the algorithm and the respective experiment file in ./cfpi/algorithms
Specify this algorithm in ./variants/SUPPORTED_ALGORITHMS.py
Create a ./variants/<alg_name>/variant.py file

Makefile usage

Makefile contains a lot of functions for faster development.

1. Download and remove Poetry

To download and install Poetry run:

make poetry-download

To uninstall

make poetry-remove

2. Install all dependencies and pre-commit hooks

Install requirements:

make install

Pre-commit hooks coulb be installed after git init via

make pre-commit-install

3. Codestyle

Automatic formatting uses pyupgrade, isort and black.

make codestyle

# or use synonym
make formatting

Codestyle checks only, without rewriting files:

make check-codestyle

Note: check-codestyle uses isort, black and darglint library

Update all dev libraries to the latest version using one comand

make update-dev-deps

4. Code security

make check-safety

This command launches Poetry integrity checks as well as identifies security issues with Safety and Bandit.

make check-safety

5. Type checks

Run mypy static type checker

make mypy

6. Tests with coverage badges

Run pytest

make test

7. All linters

Of course there is a command to ~~rule~~ run all linters in one:

make lint

the same as:

make test && make check-codestyle && make mypy && make check-safety

8. Docker

make docker-build

which is equivalent to:

make docker-build VERSION=latest

Remove docker image with

make docker-remove

More information about docker.

9. Cleanup

Delete pycache files

make pycache-remove

Remove package build

make build-remove

Delete .DS_STORE files

make dsstore-remove

Remove .mypycache

make mypycache-remove

Or to remove all above run:

make cleanup

Poetry

Want to know more about Poetry? Check its documentation.

Details about Poetry

Poetry's commands are very intuitive and easy to learn, like:

poetry add numpy@latest
poetry run pytest
poetry publish --build

etc

Building and releasing

Building a new version of the application contains steps:

Bump the version of your package poetry version <version>. You can pass the new version explicitly, or a rule such as major, minor, or patch. For more details, refer to the Semantic Versions standard.
Make a commit to GitHub.
Create a GitHub release.
And... publish 🙂 poetry publish --build

🎯 What's next

Add support for deterministic CFPI
Add support for VAE-CFPI

Lines of Code

🛡 License

This project is licensed under the terms of the MIT license. See LICENSE for more details.

📃 Citation

@misc{li2022offline,
    title={Offline Reinforcement Learning with Closed-Form Policy Improvement Operators},
    author={Jiachen Li and Edwin Zhang and Ming Yin and Qinxun Bai and Yu-Xiang Wang and William Yang Wang},
    journal={ICML},
    year={2023},

👏 Credits

This project would not be possible without the following wonderful prior work.

Optimistic Actor Critic gave inspiration to our method, D4RL provides the dataset and benchmark for evaluating the performance of our agent, and RLkit offered a strong RL framework for building our code from.

Template: python-package-template

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
cfpi		cfpi
checkpoints @ aa0e361		checkpoints @ aa0e361
docker		docker
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
cookiecutter-config-file.yml		cookiecutter-config-file.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cfpi

⚓ Installation

Get Started

Note on variants

🏗️ Development

Debugging

Development features

Adding a new algorithm

Makefile usage

Poetry

Building and releasing

🎯 What's next

Lines of Code

🛡 License

📃 Citation

👏 Credits

About

Releases

Packages

Languages

License

cfpi-icml23/cfpi

Folders and files

Latest commit

History

Repository files navigation

cfpi

⚓ Installation

Get Started

Note on variants

🏗️ Development

Debugging

Development features

Adding a new algorithm

Makefile usage

Poetry

Building and releasing

🎯 What's next

Lines of Code

🛡 License

📃 Citation

👏 Credits

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages