Implementation of vanilla stochaistic (categorical) policy gradient algorithm to play cartpole.
Vanilla policy gradient takes longer but convergence is smoother than DQN for the cartpole, both of these properties as expected.
python ./
trains model, saves the checkpoint for every 1000 episode, and saves well-trained model's weights.
- path for checkpoint file ./Save/YY-MM-DD-hh:mm:ss/vpg_cp_ep[#ep].pth
- weight file: ./Save/YY-MM-DD-hh:mm:ss/vpg_weight_ep[#ep].pth
python <directory>
python "Save/2021-03-18-15:42:31"
This test every saved weight file (vpg_weight_ep*.pth) under the directory for 100 episodes.
./ <file>
./ "Save/2021-03-18-15:42:31/vpg_weight_ep9970.pth"
This executes the cartpole env with rendering and show you how the learnt model actually works.