07‐05‐2024 Weekly Tag Up

Jump to bottom

Joe Miceli edited this page Jul 7, 2024 · 1 revision

Attendees

Chi Hui
Joe

Updates

Finished running coordination experiment
- Policies are mixed in the environment and the center agent is evaluated for its performance
  - Compared to baseline scenario in which all agents are using the same model
- 15 different scenarios were tested, each policy (queue, asl7, asl10) were tested in 5 different scenarios
- In some cases, the center agent performed better than its baseline
  - This is surprising and indicates that being surrounded by different kinds of agents may help the center agent perform its job
  - Typically though the returns were very similar to the baseline
  - May be an artifact of traffic control applications, all the policies are similar to one another in terms of their high level behavior
- In all cases, the system-wide return was worse than the baseline
These results put us in a weird spot - it's not really clear how zero shot coordination helps in this application
Also raises questions about how in literature people are calculating the returns for heterogenous systems
- Typically the reward is the same for heterogenous agents (that's not the case in our experiments)

Next Steps

May be time to pivot again
- Ava could use help with FCP problem on overcook env
  - Self-play agent group has best performance but FCP has best generalizability
  - Questions about when to assign different players during the FCP training
  - Questions about increasing the performance of FCP
- Chi Hui to set up a meeting to discuss
- Joe to review multiHRI repo
Still work on addressing bugs/issues in code