-
Notifications
You must be signed in to change notification settings - Fork 0
06‐18‐2024 Weekly Tag Up
Joe Miceli edited this page Jun 19, 2024
·
1 revision
- Joe
- Chi Hui
- Reviewed FCP zero shot coordination paper: https://arxiv.org/abs/2110.08176
- One FCP agent trained to play with a partner
- Partner is either a human or an agent trained from population play or self play (or behavioral cloning)
- Ava's experiments are with 1 FCP agent and N partners
- FCP algorithm works in 2 stages:
- Stage 1 train a pool of partners (and save them at various points so there is a varying level of skill)
- Stage 2 train FCP agent as the best response to the pool of partners
- Unclear what the details are on "best response"
- Idea is that FCP agent plays with a random partner for a bit trying to maximize it's reward then plays with a new partner, etc.
- There should be some learning done before the partner changes
- One FCP agent trained to play with a partner
- Traffic signal control application
- Population play and self play should be our baselines
-
NOTE: Self-play is synonymous with parameter sharing
- Single "agent" is the same thing as a single neural network
- In our case, we can use parameter sharing because decentralized approach gives us same results as centralized
- First step should be to train 3 agents:
- High-speed agent
- Low-speed agent
- Queue length agent
- Potential 4th: dual objective agent (minimize queue in 1 direction, minimize avg speed in other direction)
- Potential issues with this
- Implementation (unsure if it's possible to differentiate between directions when defining custom reward function)
- Physical meaning (changing the location of this agent would change the meaning of the road in the city, "moving the red road")
- Potential issues with this
- The idea is: what happens if an agent (or multiple agents) are changed in the city? Can the original agents still achieve their rewards?
- Continue code clean up
- Look into dual objective possibility for SUMO
- Think about meaning of this kind of agent on environment
- Train 3 new agents using parameter sharing
- Brainstorm simulation engines for each agent
- Is this possible?
- Could be another method of implementing self-play
- Maybe relevant work: https://worldmodels.github.io/