enhance AgentSet.do to accept a callable #2210

quaquel · 2024-08-14T11:15:34Z

This PR enhances AgentSet.do to take a callable or str. Currently, AgentSet.do takes a str which maps to a method on the agents in the set. This PR makes it possible to use a callable instead. This callable will be called with the agent as the first argument.

picks up on an idea from projectmesa#1944, see projectmesa#1944 (comment)

for more information, see https://pre-commit.ci

github-actions · 2024-08-14T11:20:32Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
Schelling	small	🔵 +0.2% [-0.2%, +0.6%]	🔵 -0.3% [-0.5%, -0.1%]
Schelling	large	🔵 +0.2% [-0.3%, +0.8%]	🔵 -0.1% [-0.8%, +0.6%]
WolfSheep	small	🔵 -0.0% [-1.2%, +1.1%]	🔵 +0.3% [-0.0%, +0.5%]
WolfSheep	large	🔵 +0.2% [-0.1%, +0.4%]	🔵 +0.4% [-0.5%, +1.4%]
BoidFlockers	small	🔵 -0.9% [-1.5%, -0.3%]	🔵 -1.9% [-2.7%, -1.2%]
BoidFlockers	large	🔵 -0.5% [-1.0%, -0.1%]	🔵 +0.3% [-0.2%, +0.9%]

EwoutH · 2024-08-14T11:48:57Z

Sounds useful!

Since the “agent” and “agentset” applies are two fully separate code paths, I’m thinking about making it two separate methods. Something apply_to_agents and apply_to_agentset would be most verbose, but maybe something shorter is possible as wel.

for more information, see https://pre-commit.ci

quaquel · 2024-08-14T12:06:12Z

Sounds useful!

Since the “agent” and “agentset” applies are two fully separate code paths, I’m thinking about making it two separate methods. Something apply_to_agents and apply_to_agentset would be most verbose, but maybe something shorter is possible as wel.

Not sure about this. The current design is analogous to pandas DataFrame.apply. This is a familiar API making it easier for users new to MESA and easier for more experienced users to remember.

EwoutH · 2024-08-14T12:13:54Z

A few thoughts:

I think applying a function over all agents in the dataset will be by far the most common, so that needs to be easy and rememberable.
Do we also want to allow functions in place?
We need to clearly document when it's best practice to make something an agent method (and call it with .do()) and when to apply an callable.
- Considering PEP 20: "There should be one-- and preferably only one --obvious way to do it."
An dedicated aggerate method might also be useful (could be a separate PR).

I think it helps if we create a list of possible use cases and write some possible API examples for them.

Would be nice if this could take some of the heavy lifting of the datacollector (and we have to be careful not to do things duplicate).

Edit: One more:

For Pandas etc. something like axis makes a lot of sense, since rows and columns can mean different things all the times, especially with multi-indexes etc.. For us, we always have an AgentSet which contains a set of Agents. So we don't need the same amount of degrees of freedom and thus complexity.

quaquel · 2024-08-14T12:38:20Z

I think applying a function over all agents in the dataset will be by far the most common, so that needs to be easy and rememberable.

I agree and this is easily done by changing the default to be axis=agentset

Do we also want to allow functions in place?

I am inclined to not do this. It overlaps with AgentSet.do (as you anticipate) and it is also not allowed by DataFrame.apply

We need to clearly document when it's best practice to make something an agent method (and call it with .do()) and when to apply a callable.

The main use case for AgentSet.apply seems to be gathering data over the agentset. An example would be calculating gini. Of course, the question then becomes whether this is redundant with calculate_gini(AgentSet.get("wealth")).

An dedicated aggerate method might also be useful (could be a separate PR).

How would that differ from agentset.apply?

For Pandas etc. something like axis makes a lot of sense, since rows and columns can mean different things all the times,

yes. I am also not sure whether axis is the best name for the keyword argument, but I needed to start somewhere.

creating agentsets is expensive, so this makes it possible to avoid creating them

for more information, see https://pre-commit.ci

EwoutH · 2024-08-14T14:07:18Z

How would that differ from agentset.apply?

It's a bit more narrowly scoped I imagined, where apply is very flexible aggregate is more focussed. agg() could just take a variable name and an aggregation function, like this:

total_wealth = agents.agg("Wealth", "sum")
mean_energy = agents.agg("Energy", np.mean)

yes. I am also not sure whether axis is the best name for the keyword argument, but I needed to start somewhere.

Of course, Cunningham's Law in full force!

quaquel · 2024-08-14T14:25:08Z

It's a bit more narrowly scoped I imagined, where apply is very flexible aggregate is more focussed. agg() could just take a variable name and an aggregation function,

Ok, but that would give us 2 ways to achieve roughly the same thing:

np.sum(agentset.get("wealth"))
agentset.agg("wealth", np.sum)

Note that doing this through agentset.apply is a bit more tricky and would involve functools.partial.

EwoutH · 2024-08-14T14:27:22Z

I'm going to try to compile a large list of use cases tomorrow, so we can test our ideas against a test set.

rht · 2024-08-14T15:30:11Z

Given that the AgentSet will have a Rust backend anyway, what about using Polars as a backend instead. This way, most of the DF-like operations will be available out of the box. @adamamer20 can fill in more about the nuances, or possibly integrate the AgentSet part of mesa-frames to core Mesa.

EwoutH · 2024-08-14T16:30:01Z

I'm not against that idea, but it will be a huge overhaul and I’m not sure we have the maintenance capacity to facilitate that.

Maybe look more serious into Rust for Mesa 4.0.

rht · 2024-08-14T17:46:19Z

I had said on Matrix chat, but will point out again here there is a critical performance issue: the time elapsed of the Boltzmann wealth model steps is quadratic in the number of agents: projectmesa/mesa-frames#25. mesa-frames addresses this by caching the active_agents (shouldn't this be selected_agents?) that you can use as a view, instead of creating a new list of agents every time. Rewriting in AgentSet Polars might make this performance fix easier.

quaquel · 2024-08-14T18:16:29Z

I concur with @EwoutH. I don't think we presently have the capacity to reimplement AgentSet. I also strongly doubt it is desirable to implement it on top of some data frame style data structure because it will break the object-oriented nature of the Agent class itself. I would very much, at some future point, like to port the core of MESA to rust for performance reasons as discussed before. But in the meantime, why not flesh out the API first?

Also, looking at @rht's graph here, I don't see quadratic scaling of MESA itself. Am i missing something?

rht · 2024-08-14T18:24:06Z

But it turns out that AgentSet has quite a number of DF-like operations. Polars is already based on Rust, and can be optimized further by using its native expression.

Also, looking at @rht's graph projectmesa/mesa-frames#25 (comment), I don't see quadratic scaling of MESA itself. Am i missing something?

This is the current Mesa's performance: . The graph mentioned is for when I replaced the AgentSet with just a list of agents.

quaquel · 2024-08-14T18:32:26Z

But it turns out that AgentSet has quite a number of DF-like operations.

Yes, the API has various operations inspired by the pandas API. From this, however,it does not follow that the best data structure for implementation is a data frame. Again, I contend it breaks OO of the underlying Agent class, and we don't have the capacity at present anyway. I would prefer to keep this PR focussed on adding functionality to the current AgentSet class rather than having it turn into a discussion about reimplementing the entire class itself. Happy to have this conversation, but preferably as a discussion/ideas topic.

This is the current Mesa's performance: . The projectmesa/mesa-frames#25 (comment) is for when I replaced the AgentSet with just a list of agents.

Ok, so there is still a remaining performance overhead of the agent set class that should be addressed. Which version o the wealth model was used for this graph?

adamamer20 · 2024-08-14T19:18:34Z

I'd like to add my 2 cents on mesa's performance.
When I initially considered ways to speed up mesa, I thought about rewriting the Mesa backend in Cython (or Rust). However, there's a main problem with that approach.
ABMs typically rely on custom functions and logic. Even if the entire current mesabackend were rewritten in Cython/Rust, when a user needs to write the step function for an agent (e.g., adding or subtracting a value from an agent's attribute), using a Python native function would still result in slow execution as everything would happen in Python.
To truly benefit from a refactored Mesa, a modeler would need to either:
(A) Write the function in Cython/Rust, or
(B) Use pre-built mesa functions written in Cython/Rust.
If mesa were to provide all the operations one might need within steps, it would essentially create a DataFrame-like API, allowing for group_by, combine, mathematical, logical, and string operations. Of course, mesa operations would also include more specific functionalities like agent movement, but they essentially rely on the same data manipulation methods.
If the intention is for modelers to continue using Python when developing models, I don't think refactoring mesa in Rust would achieve significant performance gains. If the goal is to create a Rust library, that might be a different story, but then one might ask why not use Agents.jl instead.
It's worth noting that DataFrame methods are often algorithmically optimized with various "tricks," making it difficult to achieve similar performance when writing methods from scratch.
The main issue with using DataFrames, as @rht and @EwoutH pointed out, would be maintaining sequentiality. The concept of rolling windows (especially expanding windows, where agents have a full view of previous agents) could potentially address the problem, but I still need to play around with it. Polars also offers the ability to write custom user-defined functions (which can be sequential) that can be optimized either through NumPy Universal functions or Numba.
Regarding Object-Oriented Programming (OOP), I believe it can still be used with DataFrames, and mesa-frames is an example of this, though it may require a bit more effort in the development phase.

TLDR: If Python remains the target language, I'm not sure if reimplementing the backend in Rust makes sense. For performance improvements, it might be better to focus on mesa-frames for now and potentially consider rewriting mesa in Mojo when it becomes available. In the meantime, adding DataFrame-like operations to AgentSet is a good idea, as it will facilitate compatibility with mesa-frames and possibly enable "linting" the model to vectorize it if operations are DataFrame-like.

Ok, so there is still a remaining performance overhead of the agent set class that should be addressed. Which version o the wealth model was used for this graph?

I used the Boltzmann wealth model from the old mesa-examples: projectmesa/mesa-frames@a87adda.
I haven't tested performance with the current version.

quaquel · 2024-08-15T07:07:50Z

When I initially considered ways to speed up mesa, I thought about rewriting the Mesa backend in Cython (or Rust). However, there's a main problem with that approach. ABMs typically rely on custom functions and logic. Even if the entire current mesa backend were rewritten in Cython/Rust, when a user needs to write the step function for an agent (e.g., adding or subtracting a value from an agent's attribute), using a Python native function would still result in slow execution as everything would happen in Python.

I completely agree with this. However, from this, it does not follow that we should not try to make the core of MESA as fast as possible while maintaining the current ease of use. That is, the bottleneck in terms of performance should not be in MESA itself. I don't see MESA as a workhorse library for heavy number-crunching ABMs. Rather, I see its niche as being an alternative to NetLogo for training students while being able to build small to medium-sized ABMs that are quick to develop and can be run within a reasonable amount of time.

rht · 2024-08-15T07:45:26Z

mesa/agent.py

+            # TODO:: this is a good idea, but tricky because you don't know all column names
+            return [func(agent, *args, **kwargs) for agent in self]
+        elif axis == "agentset":
+            return func(self, *args, **kwargs)


@EwoutH is there a way to turn off the codecov warning comments? It's harder to review this way.

codecov/codecov-action#135

We could add require_changes: true to the comment part of the YML. I think that might help? Or is that only about the main PR comment?

require_changes: true seems to be better, so that the comments don't show up in everyone's browsers.

Also, it turns out that whenever I refresh the browser tab, the annotation is back on. So require_changes: true is the permanent solution.

Ah I have to use the button, just putting a screenshot here for findability:

rht · 2024-08-15T07:51:03Z

mesa/agent.py

+        else:
+            raise ValueError(f"axis should be `agent` or `agentset` not {axis}")
+
+    def group_by(


This is the naming choice of Polars as well.

rht · 2024-08-15T08:18:58Z

I agree with the concept and API of this PR. No strong opinion on the axis argument, as I'm fine with it. If I'm forced to come up with alternative, maybe it would be target?

Regarding with performance, at the very least, the model step elapsed time shouldn't be quadratic in the number of agents. This fix can happen on the Python layer, without Polars gimmick.

For speeding up Mesa further with Rust/Cython, I disagree with @quaquel and @Corvince (they said something similar, but I couldn't find the comment). Mojo is a good example that it is possible to have something that is both simple to write and fast. A reminder that rewriting Mesa backend in Rust would allow it to be cross-language. If one of the concerns is not enough time to develop the features, then we can at least apply for NumFOCUS grant for funding (@adamamer20 has shown that he could write production code in his spare time lol, but maybe you prefer to do AI stuff instead, who knows). We should continue the discussion at #2042 (Rust) or #1610 (Cython).

quaquel · 2024-08-15T08:37:32Z

Regarding with performance, at the very least, the model step elapsed time shouldn't be quadratic in the number of agents. This fix can happen on the Python layer, without Polars gimmick.

I am investigating this issue as we speak and will open a separate PR/issue once I complete my diagnosis. As indicated by @Tortar, the problem is the copying done in the scheduler. That is, self.model.scheduler.agents copies the agents. It's thus not an AgentSet performance issue per se but just how the scheduler returns the agents. I even think that the pre AgentSet code in the scheduler would produce a similar issue because it too iterated over all agents if I recall correctly.

For speeding up Mesa further with Rust/Cython, I disagree with @quaquel and @Corvince (they said something similar, but I couldn't find the comment).

Yes, we discussed this before. I, however, don't think we really disagree. User-written pure Python for the present time will remain a bottleneck for the performance of MESA models. Of course, if the user moves part of their model code to, say, Cython or Rust or C or whatever, their models will run faster. The key is, and I think here we agree, to make the core of MESA not the bottleneck.

EwoutH · 2024-08-15T11:05:35Z

Some use cases:

Get the a metric (total, mean, etc.) from one attribute (wealth, energy etc.) of all agents
Get multiple metrics from multiple attributes over all agents
Combine two existing metrics
Get both the raw agent data (per agent) and one or more
Filter for agents types and properties

I think the discussion from #348 (comment) already describes most cases, together with #1944 of course.

The biggest question is: What do we want to handle in the AgentSet and what in the DataCollector?

EwoutH · 2024-08-15T11:13:39Z

So we have a few capabilities needed for the datacollector:

Select/filter on type and on properties. I think that can be perfectly handled in the AgentSet with .select, so we don't need to duplicate that in the DataCollector.
From that selected AgentSet, we want to get one or multiple variables. Each can be a function or a property. In the case it's a function .apply() might be useful.
For each variable, we might want to aggerate in one or multiple ways. If we were doing that directly from the AgentSet function that might be helpful, but we're already applied/gathered variables to a dict or something, so we might not need that directly in the AgentSet.

EwoutH · 2024-08-15T11:40:37Z

For some related thoughts on the DataCollector, see #348 (comment)

EwoutH · 2024-08-15T12:04:06Z

For some related thoughts on the DataCollector, see #348 (comment)

Okay so what if we make that whole thing a AgentSet function? That return a (tuple of) attributes and a (tuple of) aggegates. Then you can either save those or chain further.

for more information, see https://pre-commit.ci

quaquel · 2024-08-15T14:58:08Z

@EwoutH and I had a good discussion about this PR. In short, apply can easily be made redundant by slightly expanding do. Presently, do takes a string referring to a method name on the Agent class. We can add a callable option to this, in which case the callable will be called with the agent as first argument. The axis="agentset" functionality is redundant because that can always be done by just func(some_agentset), so there is no need to have this in the code base.

Also, I'll expand the groupby stuff a bit to make the following possible.

# in e.g., the eppstein model
self.model.agents.groupby("condition").apply(len)

This will return a dictionary with the number of agents for each condition (e.g., arrested, quiet, protesting).

# another API example, to be fine tuned, basically this is random activation by type
groups = self.model.agents.groupby(type, as_agentset=True)
for group_name, group in groups.items() :
    group.shuffle().do("step")

I'll separate the update to do and the adding of groupby into 2 separate PRs.

EwoutH · 2024-08-17T11:27:48Z

Thanks! Can you split of the .do() callable part into a separate PR, and add tests? Then we can discuss the group_by later.

(I can also do all of this stuff, if preferred)

for more information, see https://pre-commit.ci

quaquel · 2024-08-17T15:24:31Z

Thanks! Can you split of the .do() callable part into a separate PR, and add tests? Then we can discuss the group_by later.

I have split them so this is closed

quaquel added 2 - WIP enhancement Release notes label labels Aug 14, 2024

add agentset.apply

a617d04

picks up on an idea from projectmesa#1944, see projectmesa#1944 (comment)

quaquel force-pushed the agentset_apply branch from d89bd77 to a617d04 Compare August 14, 2024 11:17

[pre-commit.ci] auto fixes from pre-commit.com hooks

5746a82

for more information, see https://pre-commit.ci

Update agent.py

5688ac4

quaquel and others added 2 commits August 14, 2024 14:01

add group_by method to AgentSet

5890947

[pre-commit.ci] auto fixes from pre-commit.com hooks

de8024d

for more information, see https://pre-commit.ci

quaquel changed the title ~~add agentset.apply~~ add AgentSet.apply and AgentSet.group_by Aug 14, 2024

quaquel and others added 2 commits August 14, 2024 14:48

control returntype of groupby

433c659

creating agentsets is expensive, so this makes it possible to avoid creating them

[pre-commit.ci] auto fixes from pre-commit.com hooks

195875e

for more information, see https://pre-commit.ci

rht reviewed Aug 15, 2024

View reviewed changes

quaquel and others added 2 commits August 15, 2024 16:02

integrate apply into do

6c61665

[pre-commit.ci] auto fixes from pre-commit.com hooks

214b8f9

for more information, see https://pre-commit.ci

quaquel and others added 4 commits August 17, 2024 16:49

adds GroupBy helper class to enable method chaining

0c5db2b

Update agent.py

e685f1e

[pre-commit.ci] auto fixes from pre-commit.com hooks

bfa8987

for more information, see https://pre-commit.ci

Update agent.py

cddfe4a

quaquel changed the title ~~add AgentSet.apply and AgentSet.group_by~~ enhance AgentSet.do to accept a callable Aug 17, 2024

quaquel closed this Aug 17, 2024

quaquel deleted the agentset_apply branch November 4, 2024 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhance AgentSet.do to accept a callable #2210

enhance AgentSet.do to accept a callable #2210

quaquel commented Aug 14, 2024 •

edited

Loading

github-actions bot commented Aug 14, 2024

EwoutH commented Aug 14, 2024

quaquel commented Aug 14, 2024

EwoutH commented Aug 14, 2024 •

edited

Loading

quaquel commented Aug 14, 2024 •

edited

Loading

EwoutH commented Aug 14, 2024

quaquel commented Aug 14, 2024 •

edited

Loading

EwoutH commented Aug 14, 2024

rht commented Aug 14, 2024

EwoutH commented Aug 14, 2024

rht commented Aug 14, 2024

quaquel commented Aug 14, 2024 •

edited

Loading

rht commented Aug 14, 2024

quaquel commented Aug 14, 2024 •

edited

Loading

adamamer20 commented Aug 14, 2024

quaquel commented Aug 15, 2024

rht Aug 15, 2024

rht Aug 15, 2024

EwoutH Aug 15, 2024

rht Aug 15, 2024

rht Aug 15, 2024

EwoutH Aug 15, 2024

rht Aug 15, 2024

rht commented Aug 15, 2024

quaquel commented Aug 15, 2024

EwoutH commented Aug 15, 2024

EwoutH commented Aug 15, 2024

EwoutH commented Aug 15, 2024

EwoutH commented Aug 15, 2024

quaquel commented Aug 15, 2024

EwoutH commented Aug 17, 2024

quaquel commented Aug 17, 2024

enhance AgentSet.do to accept a callable #2210

enhance AgentSet.do to accept a callable #2210

Conversation

quaquel commented Aug 14, 2024 • edited Loading

github-actions bot commented Aug 14, 2024

EwoutH commented Aug 14, 2024

quaquel commented Aug 14, 2024

EwoutH commented Aug 14, 2024 • edited Loading

quaquel commented Aug 14, 2024 • edited Loading

EwoutH commented Aug 14, 2024

quaquel commented Aug 14, 2024 • edited Loading

EwoutH commented Aug 14, 2024

rht commented Aug 14, 2024

EwoutH commented Aug 14, 2024

rht commented Aug 14, 2024

quaquel commented Aug 14, 2024 • edited Loading

rht commented Aug 14, 2024

quaquel commented Aug 14, 2024 • edited Loading

adamamer20 commented Aug 14, 2024

quaquel commented Aug 15, 2024

rht Aug 15, 2024

Choose a reason for hiding this comment

rht Aug 15, 2024

Choose a reason for hiding this comment

EwoutH Aug 15, 2024

Choose a reason for hiding this comment

rht Aug 15, 2024

Choose a reason for hiding this comment

rht Aug 15, 2024

Choose a reason for hiding this comment

EwoutH Aug 15, 2024

Choose a reason for hiding this comment

rht Aug 15, 2024

Choose a reason for hiding this comment

rht commented Aug 15, 2024

quaquel commented Aug 15, 2024

EwoutH commented Aug 15, 2024

EwoutH commented Aug 15, 2024

EwoutH commented Aug 15, 2024

EwoutH commented Aug 15, 2024

quaquel commented Aug 15, 2024

EwoutH commented Aug 17, 2024

quaquel commented Aug 17, 2024

quaquel commented Aug 14, 2024 •

edited

Loading

EwoutH commented Aug 14, 2024 •

edited

Loading

quaquel commented Aug 14, 2024 •

edited

Loading

quaquel commented Aug 14, 2024 •

edited

Loading

quaquel commented Aug 14, 2024 •

edited

Loading

quaquel commented Aug 14, 2024 •

edited

Loading