Baseline implementation of recurrent PPO using truncated BPTT
deep-learning deep-reinforcement-learning pytorch recurrent-neural-networks lstm gru policy-gradient recurrence recurrent pomdp actor-critic truncated proximal-policy-optimization ppo on-policy bptt
-
Updated
Apr 28, 2024 - Jupyter Notebook