Skip to content

cfpi-icml23/cfpi

Repository files navigation

cfpi

Build status Dependencies Status Code style: black Security: bandit Pre-commit Semantic Versions License

Offline Reinforcement Learning with Closed-Form Policy Improvement Operators

⚓ Installation

We require Mambaforge (a faster drop-in replacement of conda) or conda. Mambaforge is recommended. To install, simply run

make install

If you'd like to install without downloading data, run

make NO_DATA=1 install

You'll then need to install mujoco 210 to ~/.mujoco/mujoco210/ and add the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia:$HOME/.mujoco/mjpro150/bin.

Get Started

# example run
cfpi bc

# help
cfpi --help

 Usage: cfpi [OPTIONS] ALGORITHM:{bc|mg|reverse_kl|sarsa_iqn|sg} VARIANT

╭─ Arguments ──────────────────────────────────────────────────────────────────────────╮
│ *    algorithm      ALGORITHM:{bc|mg|reverse_kl|sar  Specify algorithm to run. Find  │
│                     sa_iqn|sg}                       all supported algorithms in     │
│                                                      ./cfpi/variants/SUPPORTED_ALGO… │
│                                                      [default: None]                 │
│                                                      [required]                      │
│ *    variant        TEXT                             Specify which variant of the    │
│                                                      algorithm to run. Find all      │
│                                                      supported variant in            │
│                                                      ./cfpi/variants/<algorithm_nam… │
│                                                      [default: None]                 │
│                                                      [required]                      │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────╮
│ --parallel                          [AntMaze|MediumExpert|  Run multiple versions of │
│                                     MediumReplay|Single|Wi  the algorithm on         │
│                                     de]                     different environments   │
│                                                             and seeds.               │
│                                                             [default: None]          │
│ --gridsearch                        [CqlDeltas|Deltas|DetD  Do a gridsearch. Only    │
│                                     eltas|EasyBcq|Ensemble  supported when parallel  │
│                                     Size|FineGrainedDeltas  is also enabled          │
│                                     |Full|KFold|Mg|Pac|QTr  [default: None]          │
│                                     ainedEpochs|ReverseKl|                           │
│                                     Testing|TrqDelta]                                │
│ --dry                   --no-dry                            Just print the variant   │
│                                                             and pipeline.            │
│                                                             [default: no-dry]        │
│ --install-completion                                        Install completion for   │
│                                                             the current shell.       │
│ --show-completion                                           Show completion for the  │
│                                                             current shell, to copy   │
│                                                             it or customize the      │
│                                                             installation.            │
│ --help                                                      Show this message and    │
│                                                             exit.                    │
╰──────────────────────────────────────────────────────────────────────────────────────╯

Example command to run the Behavior Cloning VanillaVariant over Medium Expert datasets, gridsearching over Delta hyperparameters: cfpi bc --parallel MediumExpert --gridsearch Deltas VanillaVariant

Note on variants

Only variants that have a seed and env_id will run. Therefore, Base* are typically not runnable. By default, if no variant is specified the vanilla variant will run.

🏗️ Development

  1. Install pre-commit hooks:
make pre-commit-install
  1. Run the codestyle:
make codestyle

Debugging

If you get the error Failed to unlock the collection!, try running

export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring

and rerun your command.

To enable or disable debug mode, copy and paste the cfpi/conf_private.example.py to cfpi/conf_private.py.

You can run the following helpful commands to breakpoint easily:

export PYTHONBREAKPOINT='IPython.core.debugger.set_trace
# OR
export PYTHONBREAKPOINT=ipdb.set_trace

Nodemon is another helpful tool. See example usage below:

nodemon -I -x 'cfpi bc --parallel MediumExpert  --gridsearch Deltas VanillaBCVariant' -e py

If the code randomly gets stuck, the mujoco lockfile may need to be deleted. You can do so by running the delmlock at ./scripts/delmlock. We recommend adding this script to your path.

Development features

Adding a new algorithm

To add a new algorithm, you need to do three things:

  1. Create the algorithm and the respective experiment file in ./cfpi/algorithms
  2. Specify this algorithm in ./variants/SUPPORTED_ALGORITHMS.py
  3. Create a ./variants/<alg_name>/variant.py file

Makefile usage

Makefile contains a lot of functions for faster development.

1. Download and remove Poetry

To download and install Poetry run:

make poetry-download

To uninstall

make poetry-remove

2. Install all dependencies and pre-commit hooks

Install requirements:

make install

Pre-commit hooks coulb be installed after git init via

make pre-commit-install

3. Codestyle

Automatic formatting uses pyupgrade, isort and black.

make codestyle

# or use synonym
make formatting

Codestyle checks only, without rewriting files:

make check-codestyle

Note: check-codestyle uses isort, black and darglint library

Update all dev libraries to the latest version using one comand

make update-dev-deps
4. Code security

make check-safety

This command launches Poetry integrity checks as well as identifies security issues with Safety and Bandit.

make check-safety

5. Type checks

Run mypy static type checker

make mypy

6. Tests with coverage badges

Run pytest

make test

7. All linters

Of course there is a command to rule run all linters in one:

make lint

the same as:

make test && make check-codestyle && make mypy && make check-safety

8. Docker

make docker-build

which is equivalent to:

make docker-build VERSION=latest

Remove docker image with

make docker-remove

More information about docker.

9. Cleanup

Delete pycache files

make pycache-remove

Remove package build

make build-remove

Delete .DS_STORE files

make dsstore-remove

Remove .mypycache

make mypycache-remove

Or to remove all above run:

make cleanup

Poetry

Want to know more about Poetry? Check its documentation.

Details about Poetry

Poetry's commands are very intuitive and easy to learn, like:

  • poetry add numpy@latest
  • poetry run pytest
  • poetry publish --build

etc

Building and releasing

Building a new version of the application contains steps:

  • Bump the version of your package poetry version <version>. You can pass the new version explicitly, or a rule such as major, minor, or patch. For more details, refer to the Semantic Versions standard.
  • Make a commit to GitHub.
  • Create a GitHub release.
  • And... publish 🙂 poetry publish --build

🎯 What's next

  • Add support for deterministic CFPI
  • Add support for VAE-CFPI

Lines of Code

image

🛡 License

License

This project is licensed under the terms of the MIT license. See LICENSE for more details.

📃 Citation

@misc{li2022offline,
    title={Offline Reinforcement Learning with Closed-Form Policy Improvement Operators},
    author={Jiachen Li and Edwin Zhang and Ming Yin and Qinxun Bai and Yu-Xiang Wang and William Yang Wang},
    journal={ICML},
    year={2023},

👏 Credits

This project would not be possible without the following wonderful prior work.

Optimistic Actor Critic gave inspiration to our method, D4RL provides the dataset and benchmark for evaluating the performance of our agent, and RLkit offered a strong RL framework for building our code from.

Template: python-package-template

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages