Policy Evaluation in Decentralized POMDPs with Belief Sharing

Multi-agent grid world environment with partial observability and a discrete action space. In this environment, agents (e.g., radar sensors) exploit their communication network in order to track a moving target (e.g., an intruder drone). The target moves randomly according to a pre-defined transition model that takes the actions (i.e., hits) of agents into account.

Used in the paper Policy Evaluation in Decentralized POMDPs with Belief Sharing:

M. Kayaalp, F. Ghadieh and A. H. Sayed, "Policy Evaluation in Decentralized POMDPs With Belief Sharing," in IEEE Open Journal of Control Systems, vol. 2, pp. 125-145, 2023, doi: 10.1109/OJCSYS.2023.3277760.

Getting started:

Known dependencies: Python (3.7.4), Numpy (1.14.5), Cupy(10.2), Matplotlib(3.1.1), CUDA Toolkit, Networkx

Code Structure

./code/Gridworld.py/: contains code for initializing the Grid, assigning agent locations, environment simulation, target transition, rendering, and step function.
./code/Agent.py/: contains agent class which performs multiple functions per agent such as agent initialization, performing observations, taking individual policies, and forming beliefs.

Creating new environments

Use the following function to create a new environment:

env = GridWorld(num, height, width, centralized, noisy, rho, phi, sparse, alpha, beliefvectors)

num: number of agents in the grid
height : height of the grid (number of cells)
width : height of the grid (number of cells)
centralized : type of algorithm performed as included in the paper
- centralized = 0 : Algorithm 1, Centralized policy evaluation under POMDPs
- centralized = 1 : Algorithm 2, Diffusion policy evaluation under POMDPs
- centralized = 2 : Algorithm 3, Centralized evaluation for decentralized execution
noisy: level of noisiness of the observations performed by the agents
- noisy = 0: observations of low noisiness level
- noisy = 1: observations of mid-level noisiness
- noisy = 2: observations of high noisiness
sparse: sparsity of the network (boolean)
alpha : learning rate

Reset

To reset the environment while maintaining its aforementioned properties, use the reset(centralized) method from the Gridworld class.

Step

Use step(centralized) method from the Gridworld class to get the next time step in the environment. The following actions are generally taken during each step: (These steps vary depending on the type of algorithm)

Observe
Adapt
Action
Reward
Evolve
TD-Error (Temporal Difference)
Target transition
Network Agreement Error
SBE (Squared Bellman Error)

Render

Use the Render() method from the Gridworld class to visualize the actions of the agents and the transition of the target in the environment. As rendered below, the agents are represented by sensors. These sensors aim to localize a spy drone as it moves around the grid. Actions taken by the sensors are represented as noisy signals that aim to disrupt the communication between the intruder drone and its owner.

Paper citation

If you used this environment for your experiments or found it helpful, consider citing the following paper:

 @ARTICLE{kayaalp2023_policy,
  author={Kayaalp, Mert and Ghadieh, Fatima and Sayed, Ali H.},
  journal={IEEE Open Journal of Control Systems}, 
  title={Policy Evaluation in Decentralized POMDPs With Belief Sharing}, 
  year={2023},
  volume={2},
  number={},
  pages={125-145},
  doi={10.1109/OJCSYS.2023.3277760}}

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Implementation		Implementation
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Policy Evaluation in Decentralized POMDPs with Belief Sharing

Getting started:

Code Structure

Creating new environments

Reset

Step

Render

Paper citation

About

Releases 1

Packages

Contributors 2

Languages

License

asl-epfl/DecPOMDP_Policy_Evaluation_w-Belief_Sharing

Folders and files

Latest commit

History

Repository files navigation

Policy Evaluation in Decentralized POMDPs with Belief Sharing

Getting started:

Code Structure

Creating new environments

Reset

Step

Render

Paper citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages