Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Agent DataCollection Can't Handle Different Attributes in ActivationByType #1419

Closed
tpike3 opened this issue Sep 24, 2022 · 18 comments · Fixed by #1702
Closed

Comments

@tpike3
Copy link
Member

tpike3 commented Sep 24, 2022

Describe the Feature
Receive attribute error when running an agent reporter through datacollector when using RandomActivationByType

Desired behavior
I can collect against different attributes based on agent type

To Reproduce
Create a model with different agent types and attributes and try to collect against that agent attribute

Additional context
Feature would need to update at line 170 of datacollection.py agent_records = map(get_reports, model.schedule.agents)

@jackiekazil
Copy link
Member

@tpike3 I am unable to recreate the bug -- it looks like sugarscape_cg uses RandomActivationByType. Can you add it to that example & recreate it?

@tpike3
Copy link
Member Author

tpike3 commented Sep 24, 2022

If you use this colab you will get the error https://github.com/edgeofchaos42/ComplexityExplorer/blob/main/Session_18_Data_Collector_agent.ipynb Attached is the landscape you will need to load into your google drive. or update the sugar_distribution = line

Per the dev session this morning I should be able to add a custom function to filter out non-traders or at least return None, but so far unsuccessful on that code.

sugar-map.txt

@rht
Copy link
Contributor

rht commented Sep 25, 2022

I think a temporary workaround is to put the trade partners in the model reporter instead.

lambda m: {k: list(a.trade_partners) for k, a in m.schedule.agents_by_type[Trader].items()}

I used list() to copy the a.trade_partners.

@tpike3 tpike3 changed the title Agent DataCollection Can't Handle Different Attributes in ActivationByType Feature Request: Agent DataCollection Can't Handle Different Attributes in ActivationByType Oct 2, 2022
@EwoutH
Copy link
Member

EwoutH commented Oct 7, 2022

This is indeed a (in my opinion big) limitation. I wrote a proof of concept implementation for it in #1142, and used it in a personal project. If anyone ones to built on it, feel free to use it in any way!

@rht
Copy link
Contributor

rht commented Oct 9, 2022

agent_reporters = {Cat: {"Weight": "weight", "Size": "size"}, Dog: {"Weight": "weight"}}

While this looks more concise for RandomActivationByType, it is a pattern that only works for this scheduler. We can't just modify DataCollector to expect more different types of schedulers. DataCollector will become more bloated. And this is not much more concise than the workaround I specified in my earlier comment. The lack of the feature is not a blocker in @tpike3's situation.

Either:

We should look at how other ABM libraries choose their trade-off here.

@EwoutH
Copy link
Member

EwoutH commented Oct 18, 2022

Could we create a new switch for the DataCollector, multiple_agent_types, for example, which is False by default but can be set to True if you want to collect data from multiple agent types? That way it's still contained to a single class, which can reduce the maintenance effort.

@rht
Copy link
Contributor

rht commented Oct 19, 2022

The maintenance effort is about the same with DataCollectorRandomActivationByType, since it will inherit from DataCollector, and only _record_agents needs to be replaced to handle agents by type. Adding an extra switch and if clause inside DataCollector doesn't scale, because what if there is a scheduler with 2-level type hierarchy (Dict[str, Dict[str, Dict[int, Agent]]], see Multi-level Mesa)? The switch and the if's keep growing.

@EwoutH
Copy link
Member

EwoutH commented Oct 19, 2022

That's a good point.

Would you be interested in creating an (draft) implementation?

@rht
Copy link
Contributor

rht commented Oct 19, 2022

My plate is rather full right now.

@EwoutH
Copy link
Member

EwoutH commented Oct 19, 2022

Thanks for the honest reply, let's discuss at the next development meeting how we can move this project forward.

@EwoutH
Copy link
Member

EwoutH commented Oct 20, 2022

What if you would restructure the DataCollector in such way that it's split in an agent-data and model-data part?

Maybe a DataCollector superclass with a ModelDataCollector and a AgentDataCollector subclasses? In that case the ModelDataCollector always expects a single model as input, and a AgentDataCollector a single agent type. If you want to collect data for more that one agent type, you simply call the AgentDataCollector multiple times.

Many of the code is now duplicate such as _new_model_reporter and _new_agent_reporter, so why not split those? And everything that's shared can sit in the DataCollector superclass.

@jacob-thrackle
Copy link

jacob-thrackle commented Oct 27, 2022

I'm also running into this issue with respect to the agent attributes. After reading through the code, I wonder if it would be possible to just pass a specific schedule to the DataCollector constructor instead of assuming (hard-coding) that there exists an object in the model called schedule. That would seem to limit the bloat of DataCollector and remove one of the implicit assumptions in the class.

So something like this in the Model:

self.datacollector_my_agents = mesa.DataCollector(
    schedule = self.schedule_my_agents, # arbitrary schedule with an .agents attribute that returns a list of agents
    agent_reporters = {...},
    model_reporters = {...}
)

self.datacollector_your_agents = mesa.DataCollector(
    schedule = self.schedule_your_agents, # arbitrary schedule with an .agents attribute that returns a list of agents
    agent_reporters = {...},
    model_reporters = {...}
)

Then to get the data:

my_agent_data = model.datacollector_my_agents.get_agent_vars_dataframe()
your_agent_data = model.datacollector_your_agents.get_agent_vars_dataframe()

@rht
Copy link
Contributor

rht commented Oct 27, 2022

Maybe a DataCollector superclass with a ModelDataCollector and a AgentDataCollector subclasses?
...
If you want to collect data for more that one agent type, you simply call the AgentDataCollector multiple times.

This definitely could provide ingredients for people who want to build their own data collector class. But I don't see it to be cleaner than just inheriting from the current DataCollector class and modifying the method you want to change as needed.

I think for now, for trading Sugarscape use case, we should implement the solution within the examples folder. Then wait for a while for it to become a part of the library.

@rht
Copy link
Contributor

rht commented Oct 27, 2022

@jacob-thrackle your solution could work. Though we also want to make sure the >80% of the use case stays simple and convenient, where the scheduler is assigned to the model attribute schedule anyway. So that users don't have to specify the scheduler object every time.

@jacob-thrackle
Copy link

jacob-thrackle commented Oct 28, 2022

Absolutely. I started hacking around with it last night but the following section makes it difficult because of the string type for model.schedule.steps in prefix (line 158):

mesa/mesa/datacollection.py

Lines 156 to 166 in 373637e

rep_funcs = self.agent_reporters.values()
if all(hasattr(rep, "attribute_name") for rep in rep_funcs):
prefix = ["model.schedule.steps", "unique_id"]
attributes = [func.attribute_name for func in rep_funcs]
get_reports = attrgetter(*prefix + attributes)
else:
def get_reports(agent):
_prefix = (agent.model.schedule.steps, agent.unique_id)
reports = tuple(rep(agent) for rep in rep_funcs)
return _prefix + reports

If I'm understanding the code correctly, the condition at line 157 is only True if all the rep_funcs are partial which is only true if the reporter is a str:

mesa/mesa/datacollection.py

Lines 137 to 141 in 373637e

if type(reporter) is str:
attribute_name = reporter
reporter = partial(self._getattr, reporter)
reporter.attribute_name = attribute_name
self.agent_reporters[name] = reporter

So having a reporter be a lambda or some other arbitrary function would make this not true and evaluate the else statement at line 161. I assume that it is done this way so that get_reports = attrgetter(*prefix + attributes) is really fast if the reporters are just attributes of the agents but this gain is nullified once you use anything other than a string in the reporter dictionary in the DataCollector itself. Since the DataCollector.collect method is typically only called once per model-step, and unless there are a huge number of agents in which one is only interested in getting attributes, it may be worthwhile to consider removing the condition and expression in lines 157-160 and just using the function at 163.

Using the function has the added advantage of calling self._getattr() which has a default value of None, meaning that attributes which do not exist return None instead of throwing an AttributeError. The code in lines 158-160 never call self._getattr() and just use the attribute_name which is passed to attrgetter().

This is all to say that I was able to make my suggestion work by simply removing lines 157-161, only using the function get_reports at line 163 with some minor adjustments, including a condition for backwards compatibility where the scheduler is assigned to the model attribute schedule:

if self.schedule:
    schedule = self.schedule # new behavior
else:
    schedule = model.schedule # old behavior, self.schedule defaults to None in the constructor

This conditional has to be placed, unfortunately, in both _record_agents and collect, although one could limit it to one call and just pass in the schedule to _record_agents which is called from inside collect. Thoughts?

@rht
Copy link
Contributor

rht commented Oct 29, 2022

That L157-161 seems to be written by @Corvince in aaa4c54. The original commit contains this comment for the attrgetter

# Fast path if all reporters are attribute collecters

but had since been removed in the subsequent iterations.

But the original PR that did the specialization of the code when all the reporters are string-based, is #576. #575 contains some benchmark results

@Corvince
Copy link
Contributor

Thanks for digging that up @rht

And yes, @jacob-thrackle you understood exactly right how it works.
Basically, attrgetter gives the best possible performance, but only works if we have only string collectors.
This is a pure performance optimization. But, as you figured out, does not allow a default value.

Is is probably better to give up a tiny bit of performance by using getattr instead of attrgetter to have default values

woolgathering added a commit to woolgathering/mesa that referenced this issue Oct 29, 2022
…eudule and to return a default value of None if no such attribute exists.
woolgathering pushed a commit to woolgathering/mesa that referenced this issue Oct 29, 2022
…creation (defaults to model.schedule otherwise) and will return None if an attribute is not found instead of throwing an AttributeError.
@woolgathering
Copy link

woolgathering commented Oct 29, 2022

@Corvince Cool, I opened a PR from my personal account (this is @jacob-thrackle). Let me know when you guys get around to reviewing it and what changes, if any, you'd like me to make. I forgot to link it in the commit but this also fixes #976.

I made another version that kept the optimization but it only happened if two things were true: that self.schedule == None and if all the reporters in the dictionary are strings. I'm honestly not sure how often people use that (a lot?) but I can throw it back in if it's wanted. I removed it since the behavior would have been different if one had passed in model.schedule versus letting self.schedule be None. Felt inconsistent.

@tpike3 tpike3 added this to the v1.2.0 Taylor milestone Oct 31, 2022
woolgathering added a commit to woolgathering/mesa that referenced this issue Nov 2, 2022
…() now accepts a scheudle instead of a model. Attributes that do not exist return None instead of an AttributeError.
@tpike3 tpike3 modified the milestones: v1.2.0 Taylor, Mesa 2.0 Dec 8, 2022
@tpike3 tpike3 modified the milestones: Mesa 2.0, Milestone 1.2 Jan 8, 2023
@tpike3 tpike3 removed this from the Milestone 1.2 milestone Feb 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants