Skip to content

02‐21‐2024 Weekly Tag Up

Joe Miceli edited this page Feb 21, 2024 · 1 revision

Attendees

  • Chi-Hui
  • Joe

Updates

  • New normalization scheme didn't have that much impact on performance
    • Results of online-rollouts for exp 15 look very similar to exp 14 (same convergence)
  • Almost looks like we're not able to control the mean policy at all

Next Steps

  • Update lambda learning scheme to use gradient (previously discussed)
  • Run new experiment with constraint ratio of 0.75
    • Hopefully we will see that mean policy is below threshold
Clone this wiki locally