-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Boltzman Model and WolfSheep Model to Mesa_RL #197
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Updating Branch
- Visualization Script: Visualize the trained agent's behavior with Mesa's visualization tools, presenting agent movement and Gini values within the grid. You can run `server.py` file to test it with pre-trained model. | ||
|
||
## Model Behaviour | ||
The adgent as seen below learns to move towards a corner of the grid. These brings all the agents together allowing exchange of money between them resulting in reward maximization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is a bit sus to me. How could all agents simultaneously decide to go to 1 corner? This implies they all use the same weight, which is biased toward the top left. They should have instead look for nearby agents, and seek to get closer to their neighbors, until they are all in the same cell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all the weights are the same. Stable baseline doesn't' allow multiple weights. This example shows controlling multiple agents from a single weight. Is it not explicit from Readme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In nowhere in the readme can I find any indication of such fine print. It needs an explicit disclaimer that the behavior is not what ideally we should expect: where the agents seek the other agents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adgent -- should be agent
Per this discussion I would reword the Model Behaviour block to
Model Behaviour
As stable baselines controls multiple agents with the same weight, this results in the agents learning to move towards a corner of the grid. These brings all the agents together allowing exchange of money between them resulting in reward maximization.
Very cool addition with the .gif... nicely done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work @harshmahesheka I am looking forward to presenting this via George Mason. The big comment is please switch to the Solara visualization and there are some other questions and comments throughout.
- Visualization Script: Visualize the trained agent's behavior with Mesa's visualization tools, presenting agent movement and Gini values within the grid. You can run `server.py` file to test it with pre-trained model. | ||
|
||
## Model Behaviour | ||
The adgent as seen below learns to move towards a corner of the grid. These brings all the agents together allowing exchange of money between them resulting in reward maximization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adgent -- should be agent
Per this discussion I would reword the Model Behaviour block to
Model Behaviour
As stable baselines controls multiple agents with the same weight, this results in the agents learning to move towards a corner of the grid. These brings all the agents together allowing exchange of money between them resulting in reward maximization.
Very cool addition with the .gif... nicely done!
# Agents can also give money to other agents in the same cell if they have greater wealth. | ||
# The model is trained by a scientist who believes in an equal society and wants to minimize the Gini coefficient, which measures wealth inequality. | ||
# The model is trained using the Proximal Policy Optimization (PPO) algorithm from the stable-baselines3 library. | ||
# The trained model is saved as "ppo_money_model". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit nitpicky but can you change this to multi-line string (""") instead of single line string (#)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make some changes to this description to something similar to as follows
''' This code implements a multi-agent reinforcement learning (MARL) variation of the Boltzmann Wealth Model. The model observes the distribution of wealth among agents in a grid environment as they randomly exchange one unit of wealth with each other each time step. Each agent can move to neighboring cells and randomly gives money to other agents in the same cell if they have greater wealth. The goal of the agents in this model is to minimize the Gini coefficient, which measures wealth inequality. (A gini coefficient of 1 is one agent has all the money, and gini coefficient of zero is all agents have exactly the same amount). The model is trained using the Proximal Policy Optimization (PPO) algorithm from the stable-baselines3 library. The trained model is saved as "ppo_money_model" '''
import os | ||
|
||
import mesa | ||
from mesa.visualization.ModularVisualization import ModularServer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using the older visualization set up instead of Solara Viz?
MoneyModelRL, [grid, chart], "Money Model", {"N": 10, "width": 10, "height": 10} | ||
) | ||
server.port = 8521 # The default | ||
server.launch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update to Solara visualization
if len(cellmates) > 1: | ||
# Choose a random agent from the cellmates | ||
other_agent = random.choice(cellmates) | ||
if other_agent.wealth > self.wealth: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more a nice to have, but this seems to make the result deterministic regardless of the RL, it seems you would get their eventually after enough steps, RL just makes it more efficient.
Is is possible/easy to make it so the agent gets to choose who it gives wealth to so it "learns" to give wealth to someone with less money?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to keep these examples really simple and easy to train, hence only dimensional action space. But if you want I can change it
""" | ||
Create a new WolfRL-Sheep model with the given parameters. | ||
""" | ||
super().__init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to inherit all this from mesa_models? My concern is any change to that model will break this model and that inheriting these parameter is not necessary so it makes it unnecessarily brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inherited it to show how to integrate your code with mesa easily. Not inheriting the code block and rewriting everything would weaken the whole point. Also, if we are changing something in the main example, we need to change it here as well to keep it updated. So, I think we can keep this arrangement, and whenever some major change takes place in the original examples, we verify here as well. The code is relatively simple, so it shouldn't be a major task.
|
||
|
||
class SheepRL(Sheep): | ||
def step(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to inherit this from mesa_models? My concern is any change to that model will break this model and that inheriting the Sheep class is not necessary so it makes it unnecessarily brittle.
self.model.schedule.add(lamb) | ||
|
||
|
||
class WolfRL(Wolf): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to inherit this from mesa_models? My concern is any change to that model will break this model and that inheriting these parameter is not necessary so it makes it unnecessarily brittle.
import os | ||
|
||
import mesa | ||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please switch this to SolaraViz and not the old server
"policy_wolf": PolicySpec(config=PPOConfig.overrides(framework_str="torch")), | ||
}, | ||
"policy_mapping_fn": lambda agent_id, *args, **kwargs: "policy_sheep" | ||
if agent_id[0:5] == "sheep" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am genuinely curious here, why is this only 0 to 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this. Basically, agent_id is "Sheep[number]", and wolf is "Wolf[number]". So, we are checking the first 5 letters in Sheep to partition it. I will add a comment
Thanks for the appreciation. As discussed with @EwoutH and @rht, the visualization here is exactly the same as the one from mesa-examples. So, as soon as mesa-examples get updated. We can very easily update here. Basically as soon as we get this merged #154 |
I have added the remaining two examples, the Boltzman Model and the WolfSheep model, to the rlo folder. The remaining thing would be to modify the main README.md to include a description of mesa_rl. Any suggestion on it is welcomed.
Currently, I have kept things similar to the previous pull request. After merging this, we can open an issue and discuss potential changes/improvements that were left behind.