This project deals with the problem of bayesian multitask inverse reinforcement learning (or Bayesian MT-IRL) for Partially Observable Markov Decision Processes (POMDP) : consider an agent whose policy is parametrized by a vector w, whose prior distribution is a mixture of multivariate Gaussian, where the parameters of the Gaussian components
are sampled from a Normal Inverse Wishart hyperprior. The goal is to infer the posterior distribution of the
observing the trajectories generated in the MDP.
Relevant articles :
- Lazaric et al., Bayesian multitask reinforcement learning
- Dimitrakakis et al., Bayesian multitask inverse reinforcement learning
- Choi et Kim, Nonparametric bayesian inverse reinforcement learning for multiple reward functions
This code requires https://github.com/IDSIA/sacred
There are 4 main folders : core, experiments, envs and test.
- core : contains main algorithms used for the inference.
- test : unitary test for functions in core
- envs : definitions of the RL environments used for our experiments
- experiments : contains .py files, one for each experiment, and logs folder to store results
This project uses the sacred library. This allows the user to tune parameters in the CLI.
In the root folder, run the module unittest on the desired python script. example :
python -m unittest test.test_bp
In the root folder, simply run the desired python script :
python -m experiments.exp_chain
You can tune some hyperparameters such as the number of MDPs to infer, the length of trajectories, etc. by setting them in the CLI. For example, if you want trajectories length = 10 and number of MDPs = 20, simply run
python -m experiments.exp_chain with T=10 M=20