Core: Replace generator creation/iteration in CollectionState methods #4587

Mysteryem · 2025-01-31T18:09:14Z

What is this fixing or adding?

Using generators in these functions incurs overhead to create the new generator instance, call the any/all/sum function and have the any/all/sum function iterate the generator, which in turn iterates the iterable.

Replacing the use of generators with for loops is faster.

Getting self.prog_items[player] once in advance also improves performance of iterating longer iterables.

How was this tested?

I generated multiworlds with template yamls from A Hat in Time to Meritous alphabetically, --skip_output --seed 1 and progression balancing. I ran 10 generations before this PR and 10 generations after this PR and averaged the results in each case:

Before	PR	Reduction
19.08	18.68	-2.11%

The placement of all items was observed to be identical before and after this PR with the same seed.

I used timeit to pick implementations. From using timeit, it is clear that most implementations not using generators are faster for both large and small iterables.

The has_ implementations are rather simple and there are not many alternatives with similar performance.
The count_ implementations have more possibilities because some possible implementations can be better/worse with more/fewer of the counted items being in the state.

The count_ implementations I picked performed best for me on Python 3.12 up to at least 6-element iterables. For larger iterables, there are usually options that scale slightly better. The precise performance and scaling of similar options will probably change between Python versions, so it is probably not too important which implementation is picked when they perform very similarly.

While looking at logic performance for #4458, I pre-calculated some logic and updated some functions to use has_all() instead of chaining 2 or 3 has() calls together. Without this PR, making that change actually made the world take longer to generate (the has_all() calls lost more time than the time saved by pre-calculating some of the logic). With this PR, the changes make the world generate faster as I had expected.

Using generators in these functions incurs overhead to create the new generator instance, call the `any`/`all`/`sum` function and have the `any`/`all`/`sum` function iterate the generator, which in turn iterates the iterable. Replacing the use of generators with for loops is faster. Getting `self.prog_items[player]` once in advance also improves performance of iterating longer iterables.

Mysteryem · 2025-01-31T18:11:15Z

BaseClasses.py

+            if player_prog_items[item_name] > 0:
+                total += 1


This implementation could go either way with

for item_name in items: total += (player_prog_items[item_name] > 0)

black-sliver · 2025-01-31T19:54:44Z

When you posted the other result i already had the feeling that we can optimize this. I did however expect either a bigger or a smaller gain. Bigger because dicts are pretty fast, smaller because the optimizer could in theory do this (but seems like it doesn't).

Mysteryem · 2025-01-31T21:10:13Z

I think it is likely that the reason there isn't much visible change here is that the average world's performance is more affected by other areas, maybe simply that these functions are not used enough for this PR to make much of a difference on average.

I did not make a PR for these changes earlier because when I tried generations with 1 of every supported game template yaml in a single multiworld, I did not observe a noticeable difference.

For the Wind Waker PR, where I've been looking at making the logic faster, I optimised the rules of 49 locations and added use of has_all to them. Doing so made the world generate slower.

Generating with 10 TWW yamls with every location enabled generates in 1.94s without this PR, but 1.49s with this PR, for a 23.3% reduction in generation duration.

Before my most recent local change that added the has_all usage in those 49 rules, those same 10 yamls would generate in 1.90s without this PR and 1.51s with this PR. So my optimisation made the multiworld generate 0.04s slower without this PR, but 0.02s faster with this PR.

Some of the games with higher numbers of text occurrences of has_any( and has_all( in *.py, generated with 10 copies of their template yaml with and without this PR:

Game	`has_any(` text occurrences	`has_all(` text occurrences	10 template yamls before	10 template yamls with PR	Reduction
The Wind Waker (local WIP)	50	35	0.75s	0.60s	-20.0%
Starcraft 2	101	38	4.6s	3.9s	-15.2%
TUNIC	45	12	0.214s	0.210s	-1.9%
Yoshi's Island	28	256	0.23s	0.21s	- 8.7%

This is not a particularly useful count because games can define a rule function once and then reference it repeatedly, which won't be counted here, or the has_any(/has_all( rules might not be used with default options.

black-sliver · 2025-02-01T02:10:44Z

The SC2 and WW gain is now higher than I would have expected :-D
Your tables are missing used python version and architecture, by the way.

Hotness of individual rules also depends on which locations have those rules and if the seed ends up in swap or not, so actual impact is always impossible to tell just by counting occurrences.

As a small note why "dicts are pretty fast", the hash used for dict access with string keys is cached in the str object and the dict implementation in python is amazingly fast.

I have not played around with this yet, but the code looks correct and I think we should merge this asap if someone finds the time to validate the improvement and compatibility.

BadMagic100

As a general comment, documenting this in code might be good, I have a feeling that 3 months down the line someone will come along and wonder why the heck we aren't using any/all/sum and try to change it back and with any luck whoever reviews it will have forgotten as well

qwint

cherrypicked changes for in dev world work that heavily uses has_all_counts and it cut ~1s off the (unoptimized) ~7.5s gen time consistently
also saw similar improvements in other similar generator > for loop changes elsewhere

black-sliver · 2025-02-02T13:45:43Z

Also did some testing, on py3.12 and py3.13, amd64, Linux, prog balancing on, spoiler and output off.
(Full spoiler with playthrough may also be impacted positively by this, but it probably adds more noise.)

For small games, improvements are easily reproducible by taking the avg of the better half of a larger number of identical gen runs. The numbers I personally looked at were close to noise, but reproducibly better.
E.g. 2 SoE + 1 Ts results in 1.2% - 1.6% faster gen on 3.13 and 3.12 even though SoE does not use any of the modified functions.

For huge multis, with the set of games I picked, the noise level is way beyond the improvement and it's impossible to run enough gens in a reasonable amount of time to de-noise it.

black-sliver

Thanks!

Gonna merge now so any other performance improvements can rely on state's any/all/sum being fast.

Performance improvements are reproducible and I looked over the code again to see if the new implementations should behave the same as the old ones, and I don't see any errors.

github-actions bot added affects: core Issues/PRs that touch core and may need additional validation. waiting-on: peer-review Issue/PR has not been reviewed by enough people yet. labels Jan 31, 2025

Mysteryem commented Jan 31, 2025

View reviewed changes

Exempt-Medic added the is: enhancement Issues requesting new features or pull requests implementing new features. label Jan 31, 2025

BadMagic100 reviewed Feb 1, 2025

View reviewed changes

Add comment on the choice of for loops instead of any()/all()/sum()

e3d6203

qwint approved these changes Feb 2, 2025

View reviewed changes

black-sliver approved these changes Feb 2, 2025

View reviewed changes

black-sliver merged commit f28aff6 into ArchipelagoMW:main Feb 2, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Replace generator creation/iteration in CollectionState methods #4587

Core: Replace generator creation/iteration in CollectionState methods #4587

Mysteryem commented Jan 31, 2025

Mysteryem Jan 31, 2025

black-sliver commented Jan 31, 2025

Mysteryem commented Jan 31, 2025 •

edited

Loading

black-sliver commented Feb 1, 2025 •

edited

Loading

BadMagic100 left a comment

qwint left a comment

black-sliver commented Feb 2, 2025

black-sliver left a comment

Core: Replace generator creation/iteration in CollectionState methods #4587

Core: Replace generator creation/iteration in CollectionState methods #4587

Conversation

Mysteryem commented Jan 31, 2025

What is this fixing or adding?

How was this tested?

Mysteryem Jan 31, 2025

Choose a reason for hiding this comment

black-sliver commented Jan 31, 2025

Mysteryem commented Jan 31, 2025 • edited Loading

black-sliver commented Feb 1, 2025 • edited Loading

BadMagic100 left a comment

Choose a reason for hiding this comment

qwint left a comment

Choose a reason for hiding this comment

black-sliver commented Feb 2, 2025

black-sliver left a comment

Choose a reason for hiding this comment

Mysteryem commented Jan 31, 2025 •

edited

Loading

black-sliver commented Feb 1, 2025 •

edited

Loading