02‐21‐2024 Weekly Tag Up

Jump to bottom

Joe Miceli edited this page Feb 21, 2024 · 1 revision

Attendees

Chi-Hui
Joe

Updates

New normalization scheme didn't have that much impact on performance
- Results of online-rollouts for exp 15 look very similar to exp 14 (same convergence)
Almost looks like we're not able to control the mean policy at all

Next Steps

Update lambda learning scheme to use gradient (previously discussed)
Run new experiment with constraint ratio of 0.75
- Hopefully we will see that mean policy is below threshold