06‐18‐2024 Weekly Tag Up

Attendees

Joe
Chi Hui

Updates

Reviewed FCP zero shot coordination paper: https://arxiv.org/abs/2110.08176
- One FCP agent trained to play with a partner
  - Partner is either a human or an agent trained from population play or self play (or behavioral cloning)
  - Ava's experiments are with 1 FCP agent and N partners
- FCP algorithm works in 2 stages:
  - Stage 1 train a pool of partners (and save them at various points so there is a varying level of skill)
  - Stage 2 train FCP agent as the best response to the pool of partners
    - Unclear what the details are on "best response"
    - Idea is that FCP agent plays with a random partner for a bit trying to maximize it's reward then plays with a new partner, etc.
      - There should be some learning done before the partner changes
Traffic signal control application
- Population play and self play should be our baselines
- NOTE: Self-play is synonymous with parameter sharing
  - Single "agent" is the same thing as a single neural network
  - In our case, we can use parameter sharing because decentralized approach gives us same results as centralized
- First step should be to train 3 agents:
  - High-speed agent
  - Low-speed agent
  - Queue length agent
  - Potential 4th: dual objective agent (minimize queue in 1 direction, minimize avg speed in other direction)
    - Potential issues with this
      - Implementation (unsure if it's possible to differentiate between directions when defining custom reward function)
      - Physical meaning (changing the location of this agent would change the meaning of the road in the city, "moving the red road")
- The idea is: what happens if an agent (or multiple agents) are changed in the city? Can the original agents still achieve their rewards?

Next Steps

Continue code clean up
Look into dual objective possibility for SUMO
- Think about meaning of this kind of agent on environment
Train 3 new agents using parameter sharing

BONUS

Brainstorm simulation engines for each agent
- Is this possible?
- Could be another method of implementing self-play
- Maybe relevant work: https://worldmodels.github.io/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

06‐18‐2024 Weekly Tag Up

Attendees

Updates

Next Steps

BONUS

Table of Contents

Clone this wiki locally