-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggegrated agent metric in DataCollection, graph in ChartModule #1145
base: main
Are you sure you want to change the base?
Conversation
- Implements get a single aggegrated value from an agent variable - Allows the ChartModule to plot agent-level variables
Codecov Report
@@ Coverage Diff @@
## main #1145 +/- ##
==========================================
- Coverage 89.30% 88.13% -1.18%
==========================================
Files 19 19
Lines 1253 1289 +36
Branches 256 259 +3
==========================================
+ Hits 1119 1136 +17
- Misses 98 116 +18
- Partials 36 37 +1
Continue to review full report at Codecov.
|
mesa/datacollection.py
Outdated
@@ -100,6 +101,8 @@ class attributes of model | |||
|
|||
self.model_vars = {} | |||
self._agent_records = {} | |||
self.agent_attr_index = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be self.agent_attr_indexes
, to make it consistent with existing dict attribute names, e.g. self.model_vars
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of it as an index between reporter names and their position in the _agent_records
. So a single index.
Maybe agent_records_index
might be even clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, the usage of the term "an index" as a collection of the mapping is inconsistent with its later use:
# Get the index of the reporter
attr_index = self.agent_attr_index[reporter]
Where this is a single index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion for a better name?
|
Thanks for spotting @rht!
Good catch, can't believe I missed that. I found it already weird that there wasn't such a dictionary, but there was. I fixed it in 0d1aeef.
The dictionary is created to keep track of which metric is collected where in the list of Anyways I don't think it makes a big performance impact and it does make the code a bit more resilient if agent_reporters are defined in a weird way. But if you suggest an other implementation I'm open to incorporate it! |
I wasn't referring to |
@ewout, generically, I think this is a good idea, but it is a hard how to implement 1st a unhelpful philosophical rabbit hole: 2nd some thoughts to hopefully be helpful: To a specific question, the testing would go in the data_collector Hope this helps. |
Note that if model_vars and agent_vars would be the same variable, a datacollector with an agent_reporter and model_reporter with an identical variable name would not function correctly.
Looked a bit more into it. They are indeed identical, but currently I renamed it to So with my current skill set, I think this is the best implementation I can do. Then the question this, is this good enough in terms of performance and maintainability? If so, I can add tests and update the docs further. If not, @rht would you be open to re-implementing this functionality from a clean sheet? |
Any aggregate metric, by definition, is a model-level variable. The examples you showed in #1145 (comment) can be put You should stick to the existing API whenever possible. Adding more machinery will cause the library to be more complex and harder to learn. I would do something like this: def get_neighbors_min(model):
neighbors = model.datacollector.get_last_agent_report("Neighbors")
return min(neighbors)
# Later on in the agent reporter initialization
model_reporters={"neighbors_min": get_neighbors_min, ...} This way, the user is the one responsible for naming the model-level var, and there is no key collision at all. |
Thanks for your comment, I now understand your issue. The current architecture is as follow:
What you suggesting is merging step 1 and 2, if I understand correctly. While this has the advantage it can simplify code and reduce the amount of information stored, it does throw away a lot of data that could be analysed afterwards. |
No data are thrown away. See my example. I took the agent-level vars from an existing, separately-defined agent reporter. |
@rht @tpike3 @jackiekazil Maybe we could give this PR/idea another spin. I think the main questions are:
|
Following our discussion in the dev meeting earlier today, you might need to consider the case where different types of agents may have different attributes. As discussed, the implemented interface could be used as a default where all types of agents are assumed to share a common attribute. |
Another major concern that I had is how to differentiate Without this PR what I'll do would probably look very much like what was mentioned in #1145 (comment). As an alternative yet similar example: def get_min_neighbors(model):
return min([getattr(agent, "neighbors") for agent in model.schedule.agents]) # or model.grid.agents or other similar places and use this in The question is, is this generic enough to be provided as an API to the users? For instance we can have: def get_agent_metric(model, attr_name, metric="mean"):
values = [getattr(agent, attr_name) for agent in model.schedule.agents]
if metric in ["min", "max", "sum", "len"]:
# similar to what was implemented in this PR
result = ...
else:
result = ...
return result so that the users can do something like: from functools import partial
model_reporters={
"neighbors_min": partial(get_agent_metric, attr_name="neighbors", metric="min"),
...
} Personally I don't think this is really needed, since the users can fairly easily define their own functions. How about the self.data_collector = DataCollector(
model_reporters={"Agents": lambda m: m.schedule.get_agent_count()},
agent_reporters={"Neighbours": "neighbours"},
) vs. self.data_collector = DataCollector(
model_reporters={
"Agents": lambda m: m.schedule.get_agent_count(),
"Neighbours": get_min_neighbors,
}
) Again I don't really see the need to introduce |
On a second thought, it might be useful when the users need to define lots of similar functions, such as: self.data_collector = DataCollector(
model_reporters={
"Agents": lambda m: m.schedule.get_agent_count(),
"Min Neighbours": get_min_neighbors,
"Mean Neighbours": get_mean_neighbors,
"Max Neighbours": get_max_neighbors,
}
) In this case it could be easier for the users to have a common interface such as |
This is exactly what I see students (and myself, sometimes) do all the time. The main use case of this feature I see, is that you want to collect all the agent data for later proper statistical analysis, but you also want some quick values for eye-ball validation and visualisation. If I want to do that with the current datacollector possibilities, I have to define both agent and model reporters, or write custom code to transform the agent data to the thing I want. Also, I think that there should be a really easy way to plot a general statistic like the mean of an agent variable with real time visualisation. Heck, NetLogo does this for 20+ years. Maybe some of the Solara stuff leapfrogs this, but those use cases should be included in my opinion:
(both while still collecting full agent data for proper analysis) |
With merged, a very small piece of this PR is finally in Mesa! model1.agents.agg("Neighbours", min) |
Aggegrated agent variable
Currently it's not possible to quickly get a aggegrated metric of an agent variable. This PR adds a method to the
DataCollector
class calledget_agent_metric
that allows to quickly get a single value that describes the agent variable based on a stastistic.By default, it takes the mean of the value of all agent's values for that variable. It always reports the variable in the current time step. The function supports all of statistics functions, as well as the built-in
min()
,max()
,sum()
andlen()
functions.To support this:
statistics
is importedagent_attr_index
dictionary, which list the place of eachreporter
in the_agent_records
dictionaryself.agent_name_index
, which can be used to lookup thereporter
for each input variablename
Example
A model called
model1
is created, with agents that have anagent_reporter
indatacollector
variable called"Neighbours"
The new
get_agent_metric()
function can now be used to get an aggerate level statistic of the number of neighbours of the agents:Plotting agent variables
The
ChartModule
is also updated to support displaying agent variables. If it can't find a variable in the model variables, it checks if it is present in the agent variables, and if so, adds it to the chart.Example
In a Game of Life model I build the DataCollector looks like this:
The server contains two charts, one with the
Agents
, which is a model variable, and one withNeighbours
, an agent variable.On the main branch, only the first chart is displayed correctly. On the second, both are.
@tpike3, @rht and others, I would love your feedback on this PR! Please consider performance, the naming of variables and functions and API stability. Also please let me know if (and where) tests and documentation should be added.