Updated report

frangente · Feb 10, 2024 · 8b17520 · 8b17520
1 parent 98c74c3
commit 8b17520
Show file tree

Hide file tree

Showing 5 changed files with 45 additions and 23 deletions.
diff --git a/report/src/main.tex b/report/src/main.tex
@@ -10,15 +10,11 @@
 \usepackage[backend=biber,style=authoryear]{biblatex}
 \addbibresource{references.bib}
 
-\title{
-\rule{\linewidth}{0.5pt} \\[6pt] 
-\huge Autonomous Software Agents  \\ Project Report \\
-\rule{\linewidth}{2pt}  \\[10pt]
-}
+\title{\textbf{\huge Autonomous Software Agents Project Report }}
 \author{
-\begin{tabular}{c}
-\parbox{7cm}{\centering Corte Pause Manuela - 240183 \\ {\centering manuela.cortepause@studenti.unitn.it} }  \\ \\
-\parbox{7cm}{\centering Gentile Francesco - 240186\\ {\centering francesco.gentile@studenti.unitn.it}} \\
+\begin{tabular}{c c}
+\parbox{7cm}{ \centering Corte Pause Manuela - 240183 \\ \centering manuela.cortepause@studenti.unitn.it } &  
+\parbox{7cm}{ \centering Gentile Francesco - 240186\\  \centering francesco.gentile@studenti.unitn.it} \\
 \end{tabular}
 }
 \date{}
@@ -30,6 +26,7 @@
 \input{sections/background.tex}
 \input{sections/method.tex}
 \input{sections/results.tex}
+\input{sections/conclusions.tex}
 
 \printbibliography
 

diff --git a/report/src/references.bib b/report/src/references.bib
@@ -71,3 +71,14 @@ @article{a*
   keywords = {Automatic control, Automatic programming, Chemical technology, Costs, Functional programming, Gradient methods, Instruction sets, Mathematical programming, Minimax techniques, Minimization methods},
   pages    = {100--107}
 }
+
+@article{asik2023decoupled,
+  title     = {Decoupled Monte Carlo Tree Search for Cooperative Multi-Agent Planning},
+  author    = {Asik, Okan and Aydemir, Fatma Ba{\c{s}}ak and Ak{\i}n, H{\"u}seyin Levent},
+  journal   = {Applied Sciences},
+  volume    = {13},
+  number    = {3},
+  pages     = {1936},
+  year      = {2023},
+  publisher = {MDPI}
+}
diff --git a/report/src/sections/conclusions.tex b/report/src/sections/conclusions.tex
@@ -0,0 +1,12 @@
+\section{Conclusions}
+
+We have presented a multi-agent system that uses Monte Carlo Tree Search to solve the problem of collecting and delivering parcels in a dynamic environment.
+As can be seen from the results reported (see Section~\ref{sec:results}), our solution provides satisfactory results on all the maps we have tested it on. Still, there are some limitations that should be addressed in future work.
+
+First of all, the current implementation of MCTS cannot keep up in case of many and frequent changes in the environment. This is due to the fact that the tree must be constantly modified and pruned and too few iterations are performed to come up with a good plan. Thus, future focus should be on improving the efficiency of the MCTS algorithm, for example by parallelizing the search.
+
+Another issue regards the coordination between agents. In the current implementation, the search tree is built independently by each agent limiting the coordination to the exchange of messages and for the intention selection. This means that the search does not take into account possible interference between the agents at a depth greater than one (that is, after the immediate next move).
+
+A possible solution to this problem could be to move from a distributed to a centralized approach, where a leader is responsible for coordinating the agents and building a global search tree. Still such solution is not without its drawbacks. Other than being a single point of failure, a global search tree that take into consideration all possible combinations of actions for all agents would be too large to be practical. While there exists decoupled implementations for cooperative MCTS \parencite{asik2023decoupled}, such solutions do not allow to update the tree in real time but require to start from scratch every time the environment changes.
+
+Therefore, future work should focus on finding a better balance between the two approaches, possibly by keeping a local search tree for each agent with a modified version of the UCT algorithm that takes into account the future actions of the other agents. This would allow to keep the advantages of a distributed approach while still being able to coordinate the agents and take into account the interference between them.
diff --git a/report/src/sections/introduction.tex b/report/src/sections/introduction.tex
@@ -6,4 +6,4 @@ \section{Introduction}
 
 To make the problem more challenging, the agents have to deal with a number of constraints and limitations. First, while the underlying mechanics of the game are known, the environment is only partially observable to the agents which can only perceive their surroundings within a certain radius. Second, the agents have to deal with the stochastic and dynamic nature of the game, as the parameters of the game (e.g. the number of parcels, their reward distribution, the movement speed of the agents) can change from one game to another. As an additional challenge, other agents can be present in the environment and compete for the same resources. Such agents can be either adversarial or cooperative, thus requiring the agents to adapt their strategies accordingly.
 
-In the following sections, we describe the desing and implementation of our multi-agent system, as well as the methods and tools used to evaluate its performance. We also discuss the challenges and limitations of our approach, and suggest possible improvements for future work.
+In the following sections, we describe the design and implementation of the implemented agent system. While the original project required the implementation of both a single-agent and a multi-agent system, here we focus on the latter as the single-agent system is a special case of the multi-agent system (no communication and no Hungarian matching). We will also present the results of the experiments conducted to evaluate the performance of our system. Finally, we will discuss the challenges and limitations of our approach, and suggest possible improvements for future work.
diff --git a/report/src/sections/results.tex b/report/src/sections/results.tex
@@ -1,4 +1,6 @@
 \section{Results}
+\label{sec:results}
+
 To evaluate the performances of our implementation, six different maps were used and each of them focuses on a different facet of the problem (e.g. map structure, parcel rewards, etc.). The tests were conducted by running our agent on each map for five minutes and reporting the final score. It is important to note that this score is only an estimate as both the condition of the map and the behaviour of the agent have a stochastic component to them and thus are not perfectly reproducible.
 
 \subsection{Single Agent}
@@ -11,13 +13,13 @@ \subsection{Single Agent}
 Sometimes it may also happen that the agent performs sub-optimal actions such as ignoring parcels even if they are very close to the path it is already taking. This stems from the fact that the parcels' scores don't decay and thus the agent has no incentive to change the intention it has previously computed to include a new parcel when it can simply collect it later and incur in the same reward.
 \paragraph{Challenge\_22} Challenge characterized by a large number of parcels with a small average reward and a fast moving agent.
 
-This map is quite challenging as the rate at which parcels spawn and then die is very high. Moreover the agent is able to see up to a large distance and move fairly fast. This leads to having a lot of parcels that have to be taken into account at every point of the game and the mcts planner is not always able to keep up with the frequent changes and come up with the best plan that considers all parcels. In order to at least partially mitigated this problem, each time a parcel expires or is picked up by another agent, the mcts tree is pruned and that parcel removed from all paths where it was previously considered to reduce the size of the tree.
+This map is quite challenging as the rate at which parcels spawn and then die is very high. Moreover the agent is able to see up to a large distance and move fairly fast. This leads to having a lot of parcels that have to be taken into account at every point of the game and the MCTS is not always able to keep up with the frequent changes and come up with the best plan that considers all parcels. In order to at least partially mitigated this problem, each time a parcel expires or is picked up by another agent, the search tree is pruned and that parcel removed from all paths where it was previously considered to reduce the size of the tree.
 
 % By examining the agent behaviour when it is running it is easy to see that the behaviour of the agent is much more sensible in areas with few parcels than other, more dense, areas
 
-\paragraph{Challenge\_23} The map is characterized by narrow paths with many other agents moving in them, a limited number of available parcels at any time but with high rewards and an high parcel observation distance. 
+\paragraph{Challenge\_23} The map is characterized by narrow paths with many other agents moving in them, a limited number of available parcels at any time but with high rewards and an high parcel observation distance.
 
-This tests how well an agent is able to navigate its surrounding environment and either modify its path to take into account the obstacles that are other agents or drop an intention all together to pursue a more promising one.  
+This tests how well an agent is able to navigate its surrounding environment and either modify its path to take into account the obstacles that are other agents or drop an intention all together to pursue a more promising one.
 
 
 \paragraph{Challenge\_24} This map differs from the other previous challenges because parcels are able to spawn only on some of the tiles and can be delivered in a single far away position.
@@ -30,11 +32,11 @@ \subsection{Single Agent}
 \begin{table}
     \centering
     \begin{tabular}{c || c c} \hline
-                    & Benchmark  & Our Solution   \\ \hline
-    Challenge 21    &  90       &  350          \\
-    Challenge 22    &  6        &  613          \\
-    Challenge 23    &  698      &  3219         \\
-    Challenge 24    &  637      &  1470         \\  \hline 
+                     & Benchmark & Our Solution \\ \hline
+        Challenge 21 & 90        & 350          \\
+        Challenge 22 & 6         & 613          \\
+        Challenge 23 & 698       & 3219         \\
+        Challenge 24 & 637       & 1470         \\  \hline
     \end{tabular}
     \caption{Scores for the single agent maps}
     \label{tab:single_agent}
@@ -47,9 +49,9 @@ \subsection{Multi Agent}
 
 \paragraph{Challenge\_31} The map is designed with vertical lines connected by an horizontal corridor. Both delivery and spawning tiles can be found at the ends of the vertical lines.
 
-This challenge is complex due to the high number of parcels seen at any time, similarly to challenge\_22, but with the added difficulty that the parcels are viewed by two different agents and therefore the information about them is shared asynchronously. 
+This challenge is complex due to the high number of parcels seen at any time, similarly to challenge\_22, but with the added difficulty that the parcels are viewed by two different agents and therefore the information about them is shared asynchronously.
 
-Because of the dynamic nature of the environment, the agents often change their intentions and the parcels they are going to pick up. This sometimes leads to agents starting to move towards their intention only to then receive a message from another teammate stating that it has a better score for that same intention forcing the agent to drop its intention. This may result in the agents moving back and forth constantly switching their intention while not being able to make any progress. 
+Because of the dynamic nature of the environment, the agents often change their intentions and the parcels they are going to pick up. This sometimes leads to agents starting to move towards their intention only to then receive a message from another teammate stating that it has a better score for that same intention forcing the agent to drop its intention. This may result in the agents moving back and forth constantly switching their intention while not being able to make any progress.
 
 \paragraph{Challenge\_32}  This challenge has a unique configuration with vertical, separate lines with a spawning tile on one end and a delivery tile on the other and it tests the ability of the agents to effectively coordinate and collaborate to exchange parcels.
 
@@ -63,10 +65,10 @@ \subsection{Multi Agent}
 \begin{table}
     \centering
     \begin{tabular}{c || c} \hline
-                    &  Our Solution \\ \hline
-    Challenge 31    &  1352         \\
-    Challenge 32    &  5473         \\  
-    Challenge 33    &  1938         \\  \hline 
+                     & Our Solution \\ \hline
+        Challenge 31 & 1352         \\
+        Challenge 32 & 5473         \\
+        Challenge 33 & 1938         \\  \hline
     \end{tabular}
     \caption{Scores for the multi-agent maps}
     \label{tab:multi_agent}
Original file line number	Diff line number	Diff line change
Expand Up		@@ -6,4 +6,4 @@ \section{Introduction}

		To make the problem more challenging, the agents have to deal with a number of constraints and limitations. First, while the underlying mechanics of the game are known, the environment is only partially observable to the agents which can only perceive their surroundings within a certain radius. Second, the agents have to deal with the stochastic and dynamic nature of the game, as the parameters of the game (e.g. the number of parcels, their reward distribution, the movement speed of the agents) can change from one game to another. As an additional challenge, other agents can be present in the environment and compete for the same resources. Such agents can be either adversarial or cooperative, thus requiring the agents to adapt their strategies accordingly.

		In the following sections, we describe the desing and implementation of our multi-agent system, as well as the methods and tools used to evaluate its performance. We also discuss the challenges and limitations of our approach, and suggest possible improvements for future work.
		In the following sections, we describe the design and implementation of the implemented agent system. While the original project required the implementation of both a single-agent and a multi-agent system, here we focus on the latter as the single-agent system is a special case of the multi-agent system (no communication and no Hungarian matching). We will also present the results of the experiments conducted to evaluate the performance of our system. Finally, we will discuss the challenges and limitations of our approach, and suggest possible improvements for future work.