pilco

Learn to balance baby.

Roadmap

Rename this repo. Candidates:

Talos
PRL
IRL (Inference for RL)

Priorities

Run on more environments: Cartpole, Mountaincar
Tensor shapes on all methods

Clean up current code

Define documentation layout
Docstrings - Sphinx
Doctest?
Remove all hacky stuff, like hard coded tensors
Clean up pendulum.py: learning-dynamics, objective, optimisation, plotting.
Batching in calls of our agents, policies and costs. Start with policies.
Migrate to gpflow.
Feasible/Initialisation space - this needs more specification, included it so we don't forget.

Example notebooks

Pendulum notebook

Write derivations, including complexity

Moment matching

Profiling

I have no clue how profiling works, let's research into how we should go about it

Write tests (especially Monte Carlo - maybe one test for all moment matching)

EQAgent
EQCost
Transforms
EQPolicy and TransformedPolicy
Util (cholesky update)

Run on other environments

Deepmind Control suite
OpenAI gym
Cartpole
Mountaincar
Double pendulum
Lunar lander

Control something real

Cartpole swing
Lego mindstorms
Ask robotics faculty

Future algorithms

Non-greedy exploration (see the DL-algo)
Efficient Learning of Dynamics with an information based criterion (IRL)
Posterior Sampling for RL
Deep PILCO
Embed to control
PlaNet