numpy, matplotlib, gensim, pygsp
- GraphRL/
The code in this folder was built on top of this LSPI python package.
Least Squares Policy Iteration (LSPI) implementation.
Implements the algorithms described in the paper
The implementation is based on the Matlab implementation provided by the authors.
This file implements the abstract class BasisFunction. A basis function is a function that takes in a state vector and an action index and returns a vector of features.
A few specific types of basis functions are further implemented in this file:
- Fake Basis Funtion: it simply ignores all inputs an returns a constant basis vector. Can be useful for random sampling.
- One Dimensional Polynomial Basis Funcion: simple polynomial features for a state with one dimension.
- Radial Basis Function: the Gaussian multidimensional radial basis function (RBF).
- Proto-Value Basis Function: the PVFs as described in Mahadevan and Maggioni's work.
- Node2Vec: automatically learnt basis function using the node2vec algorithm.
Contains example domains that LSPI works on. In particular, it implements the grid maze domain in which the state space is a set of nodes on a N1 by N2 grid. Most nodes are always accessible (= rooms with 1. transition probability), some nodes might be inaccessible (= walls with 0. transition probability), and some nodes might be difficult to access (= obstacles with p transition probability 0 < p < 1). There is one absorbing goal state that gives reward of 100; all other states are non absorbing and do not give any reward.
Contains the main interface to LSPI algorithm.
Implements the node2vec algorithm to learn node embeddings.
Contains the LSPI policy class used for learning, executing and sampling policies.
Contains the Sample class that respresents an LSPI sample tuple (s, a, r, s', absorb)
: (state, action, observed reward, future state, absorbing state)
Implementation of LSTDQ solver with standard matrix solver (the algorithm from Figure 5 of the LSPI paper)
This class implements maze environments such as the one depicted below.
In such environment, the green states are accessible rooms, the dark purple states are strict walls and the yellow state is the goal state. An agent can be initially placed in any accessible state and it aims at reaching the goal state.
The class implements methods for learning the PVF basis functions as well as polynomial and node2vec basis functions.
This code implements functions for approximating the true value function using least-square optimisation from the basis functions (that is, by minimising the mean squared error). Note that this is only possible in small environment since the true value function computed using value iteration algorithm.
For the environment described in the previous subsection, the goal is to approximate the following value functions as closely as possible.
Code for running simulations. It can be called directly from the terminal as follows
>> python PVF_simulations.py
Running this code will run LSPI algorithm on simple maze envrionments (no walls, no obstacles)
of different size and with different LSPI hyperparameters. All the parameters are defined as
constants at the beginning of the file and can be changed accordingly. After It will save plots
(averages over 20 runs of the number of steps to reach the goal and the cumulative reward) in a folder plots/
(make sure such a folder exists in the GraphRL/
folder).