This repositoy contains the code for the master's thesis on intrinsically motivated learning in robotics conducted at the Frankfurt Institute for Advanced Studies under supervision of Jochen Triesch and Charles Wilmot.
The goal of this thesis was to investigate how intrinsic motivation can be used to benefit the control of highly complex 7-DOF robot arms.
We first conducted a detailed analysis how reinforcement learnign agents (PPO) without any extrinsic rewards discover and manipulate their environment. We found that exploration which uses intrinsic motivation computed from multiple modalities (proprioception and touch) is much more efficient than using either proprioception or touch in isolation. We thus advocate that all possible sensor streams should be factored in when trying to model human-like exploration schema.
The second part of the thesis develops a novel reinforcement learning algorithm that uses a learned inverse model of the environment to reach goals in sparse reward settings. We find that this approach is order of magnitude more effective than using random exploration to reach goals. Furthermore, our approach is suited for on-policy learning methods and fulfills a similar role as hindsight experience replay (HER) does in off-policy settings. Our approach uses a mixture policy which consists of a linear interpolation of a standard PPO policy and a deep inverse model which is conditioned on goals. We use a mixing rate
We also show that when learning the inverse model from data which was generated by intrinsically motivated agents, we can reach goals even faster and more efficient. Note that especially in settings where goals are harder to reach (further form the starting point), intrinsic motivation makes the biggest impact on performance:
TODO