Skip to content

Commit

Permalink
Add user guide in documents.
Browse files Browse the repository at this point in the history
  • Loading branch information
zjowowen committed Sep 27, 2024
1 parent 50a5948 commit 5c381aa
Show file tree
Hide file tree
Showing 7 changed files with 412 additions and 1 deletion.
66 changes: 66 additions & 0 deletions docs/source/concepts/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
Concepts
=========================================================

Frameworks consist of code and APIs designed for data transformation, model training, and deployment.
GenerativeRL is a framework that provides user-friendly APIs for training and deploying generative models and reinforcement learning (RL) agents.
In this section, we will explore the core concepts of GenerativeRL, including generative models, reinforcement learning, and their integration.
We will discuss the key design principles that underpin the GenerativeRL library and how they can be leveraged to address complex problems in the field of reinforcement learning.
Additionally, we will explain why these concepts are important and what makes GenerativeRL unique and adaptable across a wide range of applications.

Concepts Overview
-----------------

Generative Models
~~~~~~~~~~~~~~~~~

Generative models are a class of machine learning models used to generate new data samples from a given distribution, typically learned from a training dataset.
Most generative models are trained using unsupervised learning techniques and can be applied to tasks such as image, video, or audio generation, data augmentation, and interpolation.
GenerativeRL focuses on models that use continuous-time dynamics to model data distributions, such as diffusion models and flow models.
These models have a high capacity to capture complex data distributions and have demonstrated promising results in a variety of applications.
They are typically trained using maximum likelihood estimation or its variants, such as score matching, and can generate high-quality samples by solving an ordinary differential equation (ODE) or a stochastic differential equation (SDE).

.. math::
dX_t = f(X_t, t) dt + \sigma(X_t, t) dW_t
GenerativeRL provides unified APIs for training and deploying generative models based on continuous-time dynamics.
However, different generative models vary in their definitions of the drift function :math:`f` and the diffusion function :math:`\sigma`.
Some of these can be unified under common APIs, while others may require specific implementations.
There are four key differences between generative models implemented across different open-source libraries:

- **Model Definitions**: The neural network used to parameterize certain parts of the model, such as the drift function, score function, data denoiser, or potential function.
- **Path Definitions**: The definition of the stochastic process path, which determines whether the model is a diffusion model, a flow model, or a specific type of diffusion or flow model.
- **Training Procedure**: The fundamental training objective used to optimize the model parameters to maximize the likelihood of the training data. This can include pretraining methods like score matching, flow matching, or bridge matching, and fine-tuning techniques such as advantage-weighted regression, policy gradients, or adjoint matching.
- **Sampling Procedure**: The method used to generate new data samples from the model, which can involve forward or reverse sampling depending on the path and the numerical method used (e.g., Euler-Maruyama or Runge-Kutta).

GenerativeRL offers maximum flexibility, allowing users to customize and extend generative models to suit their specific needs across these four dimensions.
For instance, users can easily define their own neural network architectures, paths, training procedures, and sampling methods to create new generative models tailored to specific applications and data formats.

Reinforcement Learning
~~~~~~~~~~~~~~~~~~~~~~~

Reinforcement learning (RL) is a class of machine learning algorithms that learn to make decisions by interacting with an environment and receiving rewards or penalties based on their actions.
RL agents learn to maximize a cumulative reward signal by exploring the environment, taking actions, and updating their policies or value functions based on the observed rewards.
RL algorithms can be categorized into model-free and model-based methods, depending on whether they learn a model of the environment dynamics or directly optimize a policy or value function.
RL algorithms can also be categorized into online and offline methods, depending on whether they learn from interactions with the environment or from a fixed dataset.
Online RL algorithms can also be classified based on their exploration strategies, such as on-policy or off-policy methods, and their optimization objectives, such as policy gradients, value functions, or actor-critic methods.

Generative model integration with RL is a promising research direction that leverages generative models to improve sample efficiency, generalization, and exploration in RL.
For example, generative models can be used to learn a model of the environment dynamics, generate synthetic data for offline RL, or provide a learned Generative model policy or value function.

GenerativeRL provides a decoupled architecture that allows users to easily integrate generative models with RL algorithms.
Different generative models can be trained independently of the RL algorithms with unified APIs with little modifications to configurations.

Design Principles
-----------------

GenerativeRL is designed with the following principles, ranked from most important to least important:

- **Automatic Differentiation**: GenerativeRL leverages automatic differentiation libraries, such as PyTorch, `torchdiffeq`, and `torchdyn`, to efficiently and accurately compute gradients.
- **Unification**: GenerativeRL unifies the training and deployment of various generative models and reinforcement learning agents within a single framework.
- **Simplicity**: GenerativeRL provides a simple and intuitive interface for training and deploying generative models and RL agents.
- **Flexibility**: GenerativeRL is designed to be flexible and extensible, enabling users to easily customize and extend the library to suit their specific needs for different applications and data formats, such as tensors or dictionaries.
- **Modularity**: GenerativeRL is built on a modular architecture that allows users to mix and match different components, such as generative models, RL algorithms, and neural network architectures.
- **Reproducibility**: GenerativeRL ensures reproducible training and evaluation procedures through configurations, random seed initialization, logging, and checkpointing, making it possible to reproduce results across different runs and environments.
- **Minimal Dependencies**: GenerativeRL seeks to minimize external dependencies, providing a lightweight library that can be easily installed and used on various platforms and environments.
- **Compatibility with Existing RL Frameworks**: GenerativeRL is designed to work seamlessly with existing RL frameworks like OpenAI Gym and TorchRL, leveraging their functionality and environments.
9 changes: 8 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,15 @@ This library aims to provide a framework for combining the power of generative m

.. toctree::
:maxdepth: 2
:caption: Best Practice
:caption: Concepts

concepts/index

.. toctree::
:maxdepth: 2
:caption: User Guide

user_guide/index

.. toctree::
:maxdepth: 2
Expand Down
19 changes: 19 additions & 0 deletions docs/source/user_guide/evaluating_agents.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
How to evaluate RL agents performance
-------------------------------------------------

In GenerativeRL, the performance of reinforcement learning (RL) agents is evaluated using simulators or environments.

The class of agent is implemented as a class under the ``grl.agents`` module, which has a unified ``act`` method that takes the observation as input and returns the action.

User can evaluate the performance of an agent by running it in a simulator or environment and collecting the rewards.

.. code-block:: python
import gym
agent = algorithm.deploy()
env = gym.make(config.deploy.env.env_id)
observation = env.reset()
for _ in range(config.deploy.num_deploy_steps):
env.render()
observation, reward, done, _ = env.step(agent.act(observation))
16 changes: 16 additions & 0 deletions docs/source/user_guide/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
User Guide
================

Here is a list of user guide sections:

.. toctree::
:maxdepth: 2
:caption: User Guide

installation
training_agents
training_generative_models
evaluating_agents

For more detailed information and advanced usage examples, please refer to the API documentation and other sections of the GenerativeRL documentation.

57 changes: 57 additions & 0 deletions docs/source/user_guide/installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
How to install GenerativeRL and its dependencies
-------------------------------------------------

GenerativeRL is a Python library that requires the following dependencies to be installed:

- Python 3.9 or higher
- PyTorch 2.0.0 or higher

Install GenerativeRL using the following command:

.. code-block:: bash
git clone https://github.com/opendilab/GenerativeRL.git
cd GenerativeRL
pip install -e .
For solving reinforcement learning problems, you have to install additional environments and dependencies, such as Gym, PyBullet, MuJoCo, and DeepMind Control Suite, etc.
You can install these dependencies after installing GenerativeRL, such as:

.. code-block:: bash
pip install gym
pip install pybullet
pip install mujoco-py
pip install dm_control
It is to be noted that some of these dependencies require additional setup and licensing to use, for example, D4RL requires a special Gym environment version to be installed:

.. code-block:: bash
pip install 'gym==0.23.1'
Another important thing is that some of the environments require additional setup, such as MuJoCo, which requires the following steps:

.. code-block:: bash
sudo apt-get install libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev -y
sudo apt-get install swig gcc g++ make locales dnsutils cmake -y
sudo apt-get install build-essential libgl1-mesa-dev libgl1-mesa-glx libglew-dev -y
sudo apt-get install libosmesa6-dev libglfw3 libglfw3-dev libsdl2-dev libsdl2-image-dev -y
sudo apt-get install libglm-dev libfreetype6-dev patchelf ffmpeg -y
mkdir -p /root/.mujoco
wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz
tar -xf mujoco.tar.gz -C /root/.mujoco
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/.mujoco/mjpro210/bin:/root/.mujoco/mujoco210/bin
git clone https://github.com/Farama-Foundation/D4RL.git
cd D4RL
pip install -e .
pip install lockfile
pip install "Cython<3.0"
Check whether the installation is successful by running the following command:

.. code-block:: bash
python -c "import generativerl"
52 changes: 52 additions & 0 deletions docs/source/user_guide/training_agents.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
How to train and deploy reinforcement learning agents
-------------------------------------------------

In GenerativeRL, the RL algorithms are implemented as a class under the ``grl.algorithms`` module, while the agents are implemented as a class under the ``grl.agents`` module.

Every algorithm class has a ``train`` method that takes the environment, dataset, and other hyperparameters as input and returns the trained model.
Every algorithm class also has a ``deploy`` method that copys the trained model and returns the trained agent.

For training a specific RL algorithm, you need to follow these steps:

1. Create an instance of the RL algorithm class.

.. code-block:: python
from grl.algorithms.qgpo import QGPOAlgorithm
2. Define the hyperparameters for the algorithm in a configurations dictionary. You can use the default configurations provided under the ``grl_pipelines`` module.

.. code-block:: python
from grl_pipelines.diffusion_model.configurations.d4rl_halfcheetah_qgpo import config
3. Create an instance of algorithm class with the configurations dictionary.

.. code-block:: python
algorithm = QGPOAlgorithm(config)
4. Train the algorithm using the ``train`` method.

.. code-block:: python
trained_model = algorithm.train()
5. Deploy the trained model using the ``deploy`` method.

.. code-block:: python
agent = algorithm.deploy()
6. Use the trained agent to interact with the environment and evaluate its performance.

.. code-block:: python
import gym
env = gym.make(config.deploy.env.env_id)
observation = env.reset()
for _ in range(config.deploy.num_deploy_steps):
env.render()
observation, reward, done, _ = env.step(agent.act(observation))
For more information on how to train and deploy reinforcement learning agents, please refer to the API documentation and other sections of the GenerativeRL documentation.
Loading

0 comments on commit 5c381aa

Please sign in to comment.