Skip to content
Jannis Weil edited this page Jul 13, 2024 · 2 revisions

Welcome to the graph-marl wiki! This wiki complements the project's README.md file with additional information.

Update Notes

Random Action Selection 2f59f8a

This commit updates the random action selection of the $\epsilon$-greedy policy. In the previous code, all agents that take a random action at a step (a subset of all agents) take the same random action. Thanks to Tobias Meuser for spotting and reporting this issue. Agents now take independent random actions.

We hypothesized that this could have an effect on the training results for the routing environment with congestion. When all agents on the same node select the same action with probability $\epsilon$ instead of selecting independent random actions, this will greatly increase the probability of causing collisions between agents. However, for small $\epsilon$ at the end of training and 20 agents/nodes, it is unlikely for two agents to be on the same node and select a random action at the same time. If they are not stuck in a local optimum due to a high amount of collisions at the beginning of training, the final results should be unaffected.

We repeated the RL baseline experiments and found no significant effect of the update compared to the previous release.

For example, here is the throughput for the three graphs $G_A, G_B$ and $G_C$ as reported in Table 3 of our paper:

Agent Graph $G_A$ Graph $G_B$ Graph $G_C$
DQN $3.28 \pm 0.01$ $1.43 \pm 0.02$ $0.98 \pm 0.01$
DQNR $3.28 \pm 0.00$ $1.47 \pm 0.01$ $0.99 \pm 0.01$
CommNet $3.31 \pm 0.00$ $1.53 \pm 0.00$ $1.01 \pm 0.01$
DGN $3.29 \pm 0.00$ $1.43 \pm 0.05$ $1.00 \pm 0.00$

And here is the throughput of three new runs with the updated action selection:

Agent Graph $G_A$ Graph $G_B$ Graph $G_C$
DQN $3.28 \pm 0.00$ $1.39 \pm 0.02$ $0.98 \pm 0.00$
DQNR $3.28 \pm 0.00$ $1.46 \pm 0.00$ $1.00 \pm 0.00$
CommNet $3.31 \pm 0.00$ $1.53 \pm 0.01$ $1.01 \pm 0.00$
DGN $3.29 \pm 0.00$ $1.46 \pm 0.02$ $1.00 \pm 0.00$

Clone this wiki locally