Updated report (belief base section)

frangente · Feb 7, 2024 · 795fe93 · 795fe93
1 parent f684e2b
commit 795fe93
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 2 deletions.
diff --git a/report/src/main.tex b/report/src/main.tex
@@ -1,6 +1,8 @@
 \documentclass[twocolumn]{article}
 
 \usepackage{hyperref}
+\usepackage{amsmath}
+\usepackage{amssymb}
 \usepackage{graphicx}
 \usepackage{algorithm}
 \usepackage{algpseudocode}

diff --git a/report/src/sections/method.tex b/report/src/sections/method.tex
@@ -52,11 +52,26 @@ \subsection{Belief base}
 Hereafter, we describe the main assumptions and design choices made to keep track of the changes in the environment and to update the belief base accordingly.
 
 \paragraph*{Parcels}
-One of the main components of the belief base is the set of parcels that the agent can pickup (that is, the parcels that are not yet picked up by any agent). Each parcel is represented as a tuple $(p, l, t, v)$, where $p$ is a unique identifier, $l$ is the location of the parcel, $t$ is the time at which the parcel was first observed, and $v$ is the value of the parcel at time $t$ (by knowing the time at which the parcel was first observed, the agent can estimate the current value of the parcel based on its decay rate). Each time the agent receives a new observation (from its sensors or from the sensors of the other team members), it updates the set of parcels in its belief base accordingly. In particular, if the agent observes a free parcel that it was not aware of, it adds the parcel to its belief base. If the agent observes a parcel that it was already aware of but in a different location or now picked up by another agent, it updates its location or removes it from its belief base accordingly. Given that the agent can only perceive its surroundings within a certain radius, the state of the parcels that are not within the team's perception range can not be determined with certainty. In such case, we decided to mark as no longer free only those parcels whose last observed location is inside the perception range but that are not currently observed by any agent in the team.
+One of the main components of the belief base is the set of parcels that the agent can pickup (that is, the parcels that are not yet picked up by any agent). Each parcel is represented as a tuple $(p, l, t, v)$, where $p$ is a unique identifier, $l$ is the location of the parcel, $t$ is the time at which the parcel was first observed, and $v$ is the value of the parcel at time $t$ (by knowing the time at which the parcel was first observed, the agent can estimate the current value of the parcel based on its decay rate).
+
+Each time the agent receives a new observation (from its sensors or from the sensors of the other team members), it updates the set of parcels in its belief base accordingly. In particular, if the agent observes a free parcel that it was not aware of, it adds the parcel to its belief base. If the agent observes a parcel that it was already aware of but in a different location or now picked up by another agent, it updates its location or removes it from its belief base accordingly. Given that the agent can only perceive its surroundings within a certain radius, the state of the parcels that are not within the team's perception range can not be determined with certainty. In such case, we decided to mark as no longer free only those parcels whose last observed location is inside the perception range but that are not currently observed by any agent in the team.
 
 \paragraph*{Agents}
-Tracking the state of the other agents is also a crucial part as \dots
+Tracking the state of the other agents is also a crucial part as their positions and intentions can greatly affect the agent's own plans. Here a distinction must be made between the agents in the same team and the adversarial agents. Tracking the state of the cooperating agents is straightforward, as their position and desires are directly communicated by the agents themselves. The only additional information that the agent needs to keep track of is the last time at which the agent has sent a message to the team. This is necessary to detect when an agent has been inactive for too long and to remove it from the team.
+
+On the other hand, keeping the belief base up-to-date with the state of the adversarial agents is more challenging as we can only rely on the observations made by the agent itself and its team. Each of the adversarial agents is represented as a tuple $(a, l, t, r)$, where $a$ is a unique identifier, $l$ is the last observed location of the agent, $t$ is the time at which the agent was last observed, and $r$ is a boolean flag indicating whether the agent is considered a random agent. Note that by random agent we do not necessarily mean an agent that moves randomly (i.e. without any specific goal), but an agent whose behavior does not seem to be rational (i.e. whose actions do not seem to be aimed at maximizing its reward).
+
+Each time the agent receives a new observation, it updates the set of adversarial agents in its belief base accordingly. In particular, if the agent observes a new agent, it adds the agent to its belief base, otherwise it updates the last observed location and the last observed time of the agent. Given that the agent can only perceive its surroundings within a certain radius, the state of the adversarial agents that are not within the team's perception range can not be determined with certainty. In such case, since agents are much more dynamic than parcels, we decided to mark as undefined the position of all agents that have not been observed for a certain amount of time. This is done to avoid the agent to make decisions based on outdated information.
+
+To assess whether an agent is randomly moving, a simple heuristic has been defined. If from the first time the agent was observed to the last time the agent was observed its observed score is below a certain threshold, the agent is considered a random agent. The threshold is defined as the expected reward a greedy agent would have obtained in the same time span. This is computed in the following way:
+
+\begin{equation*}
+    \mathbb{E}_{\text{greedy}} = \frac{\mathbb{E}_{\texttt{distance}}}{\mu_{\texttt{distance}}} \cdot \frac{\mu_{\texttt{reward}}}{\texttt{numSmartAgents}}
+\end{equation*}
+
+where $\mathbb{E}_{\texttt{distance}}$ is the expected distance covered by the agent in the time span, $\mu_{\texttt{distance}}$ is the average distance between parcels in the map, $\mu_{\texttt{reward}}$ is the average reward of the parcels, and \texttt{numSmartAgents} is the number of agents in the environment that are not random agents.
 
+Finally, note that here we do not make any assumption about the possible intentions of other agents. Indeed, modelling the behaviour of other agents is a complex and challenging task that may require learning-based approaches. Therefore, we preferred not to implement any partial model of the other agents' intentions, given that wrong assumptions about the actions of the other agents can lead to suboptimal decisions.
 
 \subsection{Search}