Skip to content

Latest commit

 

History

History
66 lines (52 loc) · 2.08 KB

README.md

File metadata and controls

66 lines (52 loc) · 2.08 KB

pilco

Build Status Coverage Status

Learn to balance baby.

Balance

Roadmap

Rename this repo. Candidates:

  • Talos
  • PRL
  • IRL (Inference for RL)

Priorities

  • Run on more environments: Cartpole, Mountaincar
  • Tensor shapes on all methods

Clean up current code

  • Define documentation layout
  • Docstrings - Sphinx
  • Doctest?
  • Remove all hacky stuff, like hard coded tensors
  • Clean up pendulum.py: learning-dynamics, objective, optimisation, plotting.
  • Batching in calls of our agents, policies and costs. Start with policies.
  • Migrate to gpflow.
  • Feasible/Initialisation space - this needs more specification, included it so we don't forget.

Example notebooks

  • Pendulum notebook

Write derivations, including complexity

  • Moment matching

Profiling

  • I have no clue how profiling works, let's research into how we should go about it

Write tests (especially Monte Carlo - maybe one test for all moment matching)

  • EQAgent
  • EQCost
  • Transforms
  • EQPolicy and TransformedPolicy
  • Util (cholesky update)

Run on other environments

Control something real

  • Cartpole swing
  • Lego mindstorms
  • Ask robotics faculty

Future algorithms

  • Non-greedy exploration (see the DL-algo)
  • Efficient Learning of Dynamics with an information based criterion (IRL)
  • Posterior Sampling for RL
  • Deep PILCO
  • Embed to control
  • PlaNet