This environment built base on the gymnasium
environment. Examples are shown on the environment creation documentation. The modified environment include:
-
MTSPEnv(num_agents,num_tasks,map_boundary)
: Multiple Traveling salesman Problem- State: is a concatenated vector of agent position and a binary vector of remaining tasks
- Action space:
$a_{ij} = i*(\text{task number})+j$ means assign task$i$ for agent$j$ - Transition probabilities: this is a deterministic environment, so
$P({s}'|s,a) = {1, 0}$
- Deep-Q Network The DQN code is implemented based on the examples that are shown on youtube video.
The following plot visualizes the final solution obtained using
- DQN