Based on RLtools, which is a CPU focused deep RL for continuous control library.
Full PPO step (rollout, GAE, actor training, critic training)
Step time: 6150 ms
- Collect: 415 ms
- Evaluate critic (for GAE): 430 ms
- Training: 5120 ms
- Epoch
$\times$ Batch$= 32$ - Actor forward: 24 ms
- Actor backward: 57 ms
- Train critic: 81 ms
- Epoch