Multi-agent grid world environment with partial observability and a discrete action space. In this environment, agents (e.g., radar sensors) exploit their communication network in order to track a moving target (e.g., an intruder drone). The target moves randomly according to a pre-defined transition model that takes the actions (i.e., hits) of agents into account.
Used in the paper Policy Evaluation in Decentralized POMDPs with Belief Sharing:
M. Kayaalp, F. Ghadieh and A. H. Sayed, "Policy Evaluation in Decentralized POMDPs With Belief Sharing," in IEEE Open Journal of Control Systems, vol. 2, pp. 125-145, 2023, doi: 10.1109/OJCSYS.2023.3277760.
- Known dependencies: Python (3.7.4), Numpy (1.14.5), Cupy(10.2), Matplotlib(3.1.1), CUDA Toolkit, Networkx
: contains code for initializing the Grid, assigning agent locations, environment simulation, target transition, rendering, and step function. -
: contains agent class which performs multiple functions per agent such as agent initialization, performing observations, taking individual policies, and forming beliefs.
Use the following function to create a new environment:
env = GridWorld(num, height, width, centralized, noisy, rho, phi, sparse, alpha, beliefvectors)
: number of agents in the gridheight
: height of the grid (number of cells)width
: height of the grid (number of cells)centralized
: type of algorithm performed as included in the papercentralized = 0
: Algorithm 1, Centralized policy evaluation under POMDPscentralized = 1
: Algorithm 2, Diffusion policy evaluation under POMDPscentralized = 2
: Algorithm 3, Centralized evaluation for decentralized execution
: level of noisiness of the observations performed by the agentsnoisy = 0
: observations of low noisiness levelnoisy = 1
: observations of mid-level noisinessnoisy = 2
: observations of high noisiness
: sparsity of the network (boolean)alpha
: learning rate
To reset the environment while maintaining its aforementioned properties, use the reset(centralized)
method from the Gridworld class.
Use step(centralized)
method from the Gridworld class to get the next time step in the environment. The following actions are generally taken during each step: (These steps vary depending on the type of algorithm)
- Observe
- Adapt
- Action
- Reward
- Evolve
- TD-Error (Temporal Difference)
- Target transition
- Network Agreement Error
- SBE (Squared Bellman Error)
Use the Render()
method from the Gridworld class to visualize the actions of the agents and the transition of the target in the environment. As rendered below, the agents are represented by sensors. These sensors aim to localize a spy drone as it moves around the grid. Actions taken by the sensors are represented as noisy signals that aim to disrupt the communication between the intruder drone and its owner.
If you used this environment for your experiments or found it helpful, consider citing the following paper:
@ARTICLE{kayaalp2023_policy, author={Kayaalp, Mert and Ghadieh, Fatima and Sayed, Ali H.}, journal={IEEE Open Journal of Control Systems}, title={Policy Evaluation in Decentralized POMDPs With Belief Sharing}, year={2023}, volume={2}, number={}, pages={125-145}, doi={10.1109/OJCSYS.2023.3277760}}