Add user guide in documents.

opendilab · Sep 27, 2024 · 5c381aa · 5c381aa
1 parent 50a5948
commit 5c381aa
Show file tree

Hide file tree

Showing 7 changed files with 412 additions and 1 deletion.
diff --git a/docs/source/concepts/index.rst b/docs/source/concepts/index.rst
@@ -0,0 +1,66 @@
+Concepts
+=========================================================
+
+Frameworks consist of code and APIs designed for data transformation, model training, and deployment.
+GenerativeRL is a framework that provides user-friendly APIs for training and deploying generative models and reinforcement learning (RL) agents.
+In this section, we will explore the core concepts of GenerativeRL, including generative models, reinforcement learning, and their integration.
+We will discuss the key design principles that underpin the GenerativeRL library and how they can be leveraged to address complex problems in the field of reinforcement learning.
+Additionally, we will explain why these concepts are important and what makes GenerativeRL unique and adaptable across a wide range of applications.
+
+Concepts Overview
+-----------------
+
+Generative Models
+~~~~~~~~~~~~~~~~~
+
+Generative models are a class of machine learning models used to generate new data samples from a given distribution, typically learned from a training dataset.
+Most generative models are trained using unsupervised learning techniques and can be applied to tasks such as image, video, or audio generation, data augmentation, and interpolation.
+GenerativeRL focuses on models that use continuous-time dynamics to model data distributions, such as diffusion models and flow models.
+These models have a high capacity to capture complex data distributions and have demonstrated promising results in a variety of applications.
+They are typically trained using maximum likelihood estimation or its variants, such as score matching, and can generate high-quality samples by solving an ordinary differential equation (ODE) or a stochastic differential equation (SDE).
+
+.. math::
+
+    dX_t = f(X_t, t) dt + \sigma(X_t, t) dW_t
+
+GenerativeRL provides unified APIs for training and deploying generative models based on continuous-time dynamics.
+However, different generative models vary in their definitions of the drift function :math:`f` and the diffusion function :math:`\sigma`.
+Some of these can be unified under common APIs, while others may require specific implementations.
+There are four key differences between generative models implemented across different open-source libraries:
+
+- **Model Definitions**: The neural network used to parameterize certain parts of the model, such as the drift function, score function, data denoiser, or potential function.
+- **Path Definitions**: The definition of the stochastic process path, which determines whether the model is a diffusion model, a flow model, or a specific type of diffusion or flow model.
+- **Training Procedure**: The fundamental training objective used to optimize the model parameters to maximize the likelihood of the training data. This can include pretraining methods like score matching, flow matching, or bridge matching, and fine-tuning techniques such as advantage-weighted regression, policy gradients, or adjoint matching.
+- **Sampling Procedure**: The method used to generate new data samples from the model, which can involve forward or reverse sampling depending on the path and the numerical method used (e.g., Euler-Maruyama or Runge-Kutta).
+
+GenerativeRL offers maximum flexibility, allowing users to customize and extend generative models to suit their specific needs across these four dimensions.
+For instance, users can easily define their own neural network architectures, paths, training procedures, and sampling methods to create new generative models tailored to specific applications and data formats.
+
+Reinforcement Learning
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Reinforcement learning (RL) is a class of machine learning algorithms that learn to make decisions by interacting with an environment and receiving rewards or penalties based on their actions.
+RL agents learn to maximize a cumulative reward signal by exploring the environment, taking actions, and updating their policies or value functions based on the observed rewards.
+RL algorithms can be categorized into model-free and model-based methods, depending on whether they learn a model of the environment dynamics or directly optimize a policy or value function.
+RL algorithms can also be categorized into online and offline methods, depending on whether they learn from interactions with the environment or from a fixed dataset.
+Online RL algorithms can also be classified based on their exploration strategies, such as on-policy or off-policy methods, and their optimization objectives, such as policy gradients, value functions, or actor-critic methods.
+
+Generative model integration with RL is a promising research direction that leverages generative models to improve sample efficiency, generalization, and exploration in RL.
+For example, generative models can be used to learn a model of the environment dynamics, generate synthetic data for offline RL, or provide a learned Generative model policy or value function.
+
+GenerativeRL provides a decoupled architecture that allows users to easily integrate generative models with RL algorithms.
+Different generative models can be trained independently of the RL algorithms with unified APIs with little modifications to configurations.
+
+Design Principles
+-----------------
+
+GenerativeRL is designed with the following principles, ranked from most important to least important:
+
+- **Automatic Differentiation**: GenerativeRL leverages automatic differentiation libraries, such as PyTorch, `torchdiffeq`, and `torchdyn`, to efficiently and accurately compute gradients.
+- **Unification**: GenerativeRL unifies the training and deployment of various generative models and reinforcement learning agents within a single framework.
+- **Simplicity**: GenerativeRL provides a simple and intuitive interface for training and deploying generative models and RL agents.
+- **Flexibility**: GenerativeRL is designed to be flexible and extensible, enabling users to easily customize and extend the library to suit their specific needs for different applications and data formats, such as tensors or dictionaries.
+- **Modularity**: GenerativeRL is built on a modular architecture that allows users to mix and match different components, such as generative models, RL algorithms, and neural network architectures.
+- **Reproducibility**: GenerativeRL ensures reproducible training and evaluation procedures through configurations, random seed initialization, logging, and checkpointing, making it possible to reproduce results across different runs and environments.
+- **Minimal Dependencies**: GenerativeRL seeks to minimize external dependencies, providing a lightweight library that can be easily installed and used on various platforms and environments.
+- **Compatibility with Existing RL Frameworks**: GenerativeRL is designed to work seamlessly with existing RL frameworks like OpenAI Gym and TorchRL, leveraging their functionality and environments.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -16,8 +16,15 @@ This library aims to provide a framework for combining the power of generative m
 
 .. toctree::
     :maxdepth: 2
-    :caption: Best Practice
+    :caption: Concepts
 
+    concepts/index
+
+.. toctree::
+    :maxdepth: 2
+    :caption: User Guide
+
+    user_guide/index
 
 .. toctree::
     :maxdepth: 2

diff --git a/docs/source/user_guide/evaluating_agents.rst b/docs/source/user_guide/evaluating_agents.rst
@@ -0,0 +1,19 @@
+How to evaluate RL agents performance
+-------------------------------------------------
+
+In GenerativeRL, the performance of reinforcement learning (RL) agents is evaluated using simulators or environments.
+
+The class of agent is implemented as a class under the ``grl.agents`` module, which has a unified ``act`` method that takes the observation as input and returns the action.
+
+User can evaluate the performance of an agent by running it in a simulator or environment and collecting the rewards.
+
+.. code-block:: python
+
+    import gym
+    agent = algorithm.deploy()
+    env = gym.make(config.deploy.env.env_id)
+    observation = env.reset()
+    for _ in range(config.deploy.num_deploy_steps):
+        env.render()
+        observation, reward, done, _ = env.step(agent.act(observation))
+
diff --git a/docs/source/user_guide/index.rst b/docs/source/user_guide/index.rst
@@ -0,0 +1,16 @@
+User Guide
+================
+
+Here is a list of user guide sections:
+
+.. toctree::
+    :maxdepth: 2
+    :caption: User Guide
+
+    installation
+    training_agents
+    training_generative_models
+    evaluating_agents
+
+For more detailed information and advanced usage examples, please refer to the API documentation and other sections of the GenerativeRL documentation.
+
diff --git a/docs/source/user_guide/installation.rst b/docs/source/user_guide/installation.rst
@@ -0,0 +1,57 @@
+How to install GenerativeRL and its dependencies
+-------------------------------------------------
+
+GenerativeRL is a Python library that requires the following dependencies to be installed:
+
+- Python 3.9 or higher
+- PyTorch 2.0.0 or higher
+
+Install GenerativeRL using the following command:
+
+.. code-block:: bash
+
+    git clone https://github.com/opendilab/GenerativeRL.git
+    cd GenerativeRL
+    pip install -e .
+
+For solving reinforcement learning problems, you have to install additional environments and dependencies, such as Gym, PyBullet, MuJoCo, and DeepMind Control Suite, etc.
+You can install these dependencies after installing GenerativeRL, such as:
+
+.. code-block:: bash
+
+    pip install gym
+    pip install pybullet
+    pip install mujoco-py
+    pip install dm_control
+
+It is to be noted that some of these dependencies require additional setup and licensing to use, for example, D4RL requires a special Gym environment version to be installed:
+
+.. code-block:: bash
+
+    pip install 'gym==0.23.1'
+
+Another important thing is that some of the environments require additional setup, such as MuJoCo, which requires the following steps:
+
+.. code-block:: bash
+
+    sudo apt-get install libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev -y
+    sudo apt-get install swig gcc g++ make locales dnsutils cmake -y
+    sudo apt-get install build-essential libgl1-mesa-dev libgl1-mesa-glx libglew-dev -y
+    sudo apt-get install libosmesa6-dev libglfw3 libglfw3-dev libsdl2-dev libsdl2-image-dev -y
+    sudo apt-get install libglm-dev libfreetype6-dev patchelf ffmpeg -y
+    mkdir -p /root/.mujoco
+    wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz
+    tar -xf mujoco.tar.gz -C /root/.mujoco
+    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/.mujoco/mjpro210/bin:/root/.mujoco/mujoco210/bin
+    git clone https://github.com/Farama-Foundation/D4RL.git
+    cd D4RL
+    pip install -e .
+    pip install lockfile
+    pip install "Cython<3.0"
+
+Check whether the installation is successful by running the following command:
+
+.. code-block:: bash
+
+    python -c "import generativerl"
+
diff --git a/docs/source/user_guide/training_agents.rst b/docs/source/user_guide/training_agents.rst
@@ -0,0 +1,52 @@
+How to train and deploy reinforcement learning agents
+-------------------------------------------------
+
+In GenerativeRL, the RL algorithms are implemented as a class under the ``grl.algorithms`` module, while the agents are implemented as a class under the ``grl.agents`` module.
+
+Every algorithm class has a ``train`` method that takes the environment, dataset, and other hyperparameters as input and returns the trained model.
+Every algorithm class also has a ``deploy`` method that copys the trained model and returns the trained agent.
+
+For training a specific RL algorithm, you need to follow these steps:
+
+1. Create an instance of the RL algorithm class.
+
+.. code-block:: python
+
+    from grl.algorithms.qgpo import QGPOAlgorithm
+
+2. Define the hyperparameters for the algorithm in a configurations dictionary. You can use the default configurations provided under the ``grl_pipelines`` module.
+
+.. code-block:: python
+
+    from grl_pipelines.diffusion_model.configurations.d4rl_halfcheetah_qgpo import config
+
+3. Create an instance of algorithm class with the configurations dictionary.
+
+.. code-block:: python
+
+    algorithm = QGPOAlgorithm(config)
+
+4. Train the algorithm using the ``train`` method.
+
+.. code-block:: python
+
+    trained_model = algorithm.train()
+
+5. Deploy the trained model using the ``deploy`` method.
+
+.. code-block:: python
+
+    agent = algorithm.deploy()
+
+6. Use the trained agent to interact with the environment and evaluate its performance.
+
+.. code-block:: python
+
+    import gym
+    env = gym.make(config.deploy.env.env_id)
+    observation = env.reset()
+    for _ in range(config.deploy.num_deploy_steps):
+        env.render()
+        observation, reward, done, _ = env.step(agent.act(observation))
+
+For more information on how to train and deploy reinforcement learning agents, please refer to the API documentation and other sections of the GenerativeRL documentation.