Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Boltzman Model and WolfSheep Model to Mesa_RL #197

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

harshmahesheka
Copy link
Contributor

@harshmahesheka harshmahesheka commented Sep 6, 2024

I have added the remaining two examples, the Boltzman Model and the WolfSheep model, to the rlo folder. The remaining thing would be to modify the main README.md to include a description of mesa_rl. Any suggestion on it is welcomed.
Currently, I have kept things similar to the previous pull request. After merging this, we can open an issue and discuss potential changes/improvements that were left behind.

- Visualization Script: Visualize the trained agent's behavior with Mesa's visualization tools, presenting agent movement and Gini values within the grid. You can run `server.py` file to test it with pre-trained model.

## Model Behaviour
The adgent as seen below learns to move towards a corner of the grid. These brings all the agents together allowing exchange of money between them resulting in reward maximization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is a bit sus to me. How could all agents simultaneously decide to go to 1 corner? This implies they all use the same weight, which is biased toward the top left. They should have instead look for nearby agents, and seek to get closer to their neighbors, until they are all in the same cell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all the weights are the same. Stable baseline doesn't' allow multiple weights. This example shows controlling multiple agents from a single weight. Is it not explicit from Readme?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In nowhere in the readme can I find any indication of such fine print. It needs an explicit disclaimer that the behavior is not what ideally we should expect: where the agents seek the other agents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adgent -- should be agent

Per this discussion I would reword the Model Behaviour block to

Model Behaviour

As stable baselines controls multiple agents with the same weight, this results in the agents learning to move towards a corner of the grid. These brings all the agents together allowing exchange of money between them resulting in reward maximization.

Very cool addition with the .gif... nicely done!

Copy link
Member

@tpike3 tpike3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @harshmahesheka I am looking forward to presenting this via George Mason. The big comment is please switch to the Solara visualization and there are some other questions and comments throughout.

- Visualization Script: Visualize the trained agent's behavior with Mesa's visualization tools, presenting agent movement and Gini values within the grid. You can run `server.py` file to test it with pre-trained model.

## Model Behaviour
The adgent as seen below learns to move towards a corner of the grid. These brings all the agents together allowing exchange of money between them resulting in reward maximization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adgent -- should be agent

Per this discussion I would reword the Model Behaviour block to

Model Behaviour

As stable baselines controls multiple agents with the same weight, this results in the agents learning to move towards a corner of the grid. These brings all the agents together allowing exchange of money between them resulting in reward maximization.

Very cool addition with the .gif... nicely done!

# Agents can also give money to other agents in the same cell if they have greater wealth.
# The model is trained by a scientist who believes in an equal society and wants to minimize the Gini coefficient, which measures wealth inequality.
# The model is trained using the Proximal Policy Optimization (PPO) algorithm from the stable-baselines3 library.
# The trained model is saved as "ppo_money_model".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit nitpicky but can you change this to multi-line string (""") instead of single line string (#)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make some changes to this description to something similar to as follows

''' This code implements a multi-agent reinforcement learning (MARL) variation of the Boltzmann Wealth Model. The model observes the distribution of wealth among agents in a grid environment as they randomly exchange one unit of wealth with each other each time step. Each agent can move to neighboring cells and randomly gives money to other agents in the same cell if they have greater wealth. The goal of the agents in this model is to minimize the Gini coefficient, which measures wealth inequality. (A gini coefficient of 1 is one agent has all the money, and gini coefficient of zero is all agents have exactly the same amount). The model is trained using the Proximal Policy Optimization (PPO) algorithm from the stable-baselines3 library. The trained model is saved as "ppo_money_model" '''

import os

import mesa
from mesa.visualization.ModularVisualization import ModularServer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using the older visualization set up instead of Solara Viz?

MoneyModelRL, [grid, chart], "Money Model", {"N": 10, "width": 10, "height": 10}
)
server.port = 8521 # The default
server.launch()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update to Solara visualization

if len(cellmates) > 1:
# Choose a random agent from the cellmates
other_agent = random.choice(cellmates)
if other_agent.wealth > self.wealth:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more a nice to have, but this seems to make the result deterministic regardless of the RL, it seems you would get their eventually after enough steps, RL just makes it more efficient.

Is is possible/easy to make it so the agent gets to choose who it gives wealth to so it "learns" to give wealth to someone with less money?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to keep these examples really simple and easy to train, hence only dimensional action space. But if you want I can change it

"""
Create a new WolfRL-Sheep model with the given parameters.
"""
super().__init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to inherit all this from mesa_models? My concern is any change to that model will break this model and that inheriting these parameter is not necessary so it makes it unnecessarily brittle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inherited it to show how to integrate your code with mesa easily. Not inheriting the code block and rewriting everything would weaken the whole point. Also, if we are changing something in the main example, we need to change it here as well to keep it updated. So, I think we can keep this arrangement, and whenever some major change takes place in the original examples, we verify here as well. The code is relatively simple, so it shouldn't be a major task.



class SheepRL(Sheep):
def step(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to inherit this from mesa_models? My concern is any change to that model will break this model and that inheriting the Sheep class is not necessary so it makes it unnecessarily brittle.

self.model.schedule.add(lamb)


class WolfRL(Wolf):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to inherit this from mesa_models? My concern is any change to that model will break this model and that inheriting these parameter is not necessary so it makes it unnecessarily brittle.

import os

import mesa
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please switch this to SolaraViz and not the old server

"policy_wolf": PolicySpec(config=PPOConfig.overrides(framework_str="torch")),
},
"policy_mapping_fn": lambda agent_id, *args, **kwargs: "policy_sheep"
if agent_id[0:5] == "sheep"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am genuinely curious here, why is this only 0 to 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this. Basically, agent_id is "Sheep[number]", and wolf is "Wolf[number]". So, we are checking the first 5 letters in Sheep to partition it. I will add a comment

@harshmahesheka
Copy link
Contributor Author

harshmahesheka commented Sep 18, 2024

Good work @harshmahesheka I am looking forward to presenting this via George Mason. The big comment is please switch to the Solara visualization and there are some other questions and comments throughout.

Thanks for the appreciation. As discussed with @EwoutH and @rht, the visualization here is exactly the same as the one from mesa-examples. So, as soon as mesa-examples get updated. We can very easily update here. Basically as soon as we get this merged #154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants