Skip to content

06‐18‐2024 Weekly Tag Up

Joe Miceli edited this page Jun 19, 2024 · 1 revision

Attendees

  • Joe
  • Chi Hui

Updates

  • Reviewed FCP zero shot coordination paper: https://arxiv.org/abs/2110.08176
    • One FCP agent trained to play with a partner
      • Partner is either a human or an agent trained from population play or self play (or behavioral cloning)
      • Ava's experiments are with 1 FCP agent and N partners
    • FCP algorithm works in 2 stages:
      • Stage 1 train a pool of partners (and save them at various points so there is a varying level of skill)
      • Stage 2 train FCP agent as the best response to the pool of partners
        • Unclear what the details are on "best response"
        • Idea is that FCP agent plays with a random partner for a bit trying to maximize it's reward then plays with a new partner, etc.
          • There should be some learning done before the partner changes
  • Traffic signal control application
    • Population play and self play should be our baselines
    • NOTE: Self-play is synonymous with parameter sharing
      • Single "agent" is the same thing as a single neural network
      • In our case, we can use parameter sharing because decentralized approach gives us same results as centralized
    • First step should be to train 3 agents:
      • High-speed agent
      • Low-speed agent
      • Queue length agent
      • Potential 4th: dual objective agent (minimize queue in 1 direction, minimize avg speed in other direction)
        • Potential issues with this
          • Implementation (unsure if it's possible to differentiate between directions when defining custom reward function)
          • Physical meaning (changing the location of this agent would change the meaning of the road in the city, "moving the red road")
    • The idea is: what happens if an agent (or multiple agents) are changed in the city? Can the original agents still achieve their rewards?

Next Steps

  • Continue code clean up
  • Look into dual objective possibility for SUMO
    • Think about meaning of this kind of agent on environment
  • Train 3 new agents using parameter sharing

BONUS

  • Brainstorm simulation engines for each agent
Clone this wiki locally