This code is for the numerical example used in our paper "Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach" found her
We use the notebook main.ipynb to implement, in a step-by-step and user-friendly fashion, the learning algorithm as explained in our paper above. We also conclude it by a closed-loop performance comparison, between the reinforcement learning controller and a certainty equivalence LQR controller.
The code can be adopted to different examples by adjusting the dynamic model defined in The reinforcement learning algorithm (here it is the DDPG) can be changed as well through changing/replacing by the intended algorithm.
Example systems can be defined without explicitly stating their state/output dynamics' jacobians; the extended Kalman filter code itself implements automatic differentiation and can get these jacobians easily.