Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Replace generator creation/iteration in CollectionState methods #4587

Merged

Conversation

Mysteryem
Copy link
Contributor

What is this fixing or adding?

Using generators in these functions incurs overhead to create the new generator instance, call the any/all/sum function and have the any/all/sum function iterate the generator, which in turn iterates the iterable.

Replacing the use of generators with for loops is faster.

Getting self.prog_items[player] once in advance also improves performance of iterating longer iterables.

How was this tested?

I generated multiworlds with template yamls from A Hat in Time to Meritous alphabetically, --skip_output --seed 1 and progression balancing. I ran 10 generations before this PR and 10 generations after this PR and averaged the results in each case:

Before PR Reduction
19.08 18.68 -2.11%

The placement of all items was observed to be identical before and after this PR with the same seed.


I used timeit to pick implementations. From using timeit, it is clear that most implementations not using generators are faster for both large and small iterables.

The has_ implementations are rather simple and there are not many alternatives with similar performance.
The count_ implementations have more possibilities because some possible implementations can be better/worse with more/fewer of the counted items being in the state.

The count_ implementations I picked performed best for me on Python 3.12 up to at least 6-element iterables. For larger iterables, there are usually options that scale slightly better. The precise performance and scaling of similar options will probably change between Python versions, so it is probably not too important which implementation is picked when they perform very similarly.


While looking at logic performance for #4458, I pre-calculated some logic and updated some functions to use has_all() instead of chaining 2 or 3 has() calls together. Without this PR, making that change actually made the world take longer to generate (the has_all() calls lost more time than the time saved by pre-calculating some of the logic). With this PR, the changes make the world generate faster as I had expected.

Using generators in these functions incurs overhead to create the new
generator instance, call the `any`/`all`/`sum` function and have the
`any`/`all`/`sum` function iterate the generator, which in turn iterates
the iterable.

Replacing the use of generators with for loops is faster.

Getting `self.prog_items[player]` once in advance also improves
performance of iterating longer iterables.
@github-actions github-actions bot added affects: core Issues/PRs that touch core and may need additional validation. waiting-on: peer-review Issue/PR has not been reviewed by enough people yet. labels Jan 31, 2025
Comment on lines +941 to +942
if player_prog_items[item_name] > 0:
total += 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation could go either way with

        for item_name in items:
            total += (player_prog_items[item_name] > 0)

@black-sliver
Copy link
Member

When you posted the other result i already had the feeling that we can optimize this. I did however expect either a bigger or a smaller gain. Bigger because dicts are pretty fast, smaller because the optimizer could in theory do this (but seems like it doesn't).

@Mysteryem
Copy link
Contributor Author

Mysteryem commented Jan 31, 2025

I think it is likely that the reason there isn't much visible change here is that the average world's performance is more affected by other areas, maybe simply that these functions are not used enough for this PR to make much of a difference on average.

I did not make a PR for these changes earlier because when I tried generations with 1 of every supported game template yaml in a single multiworld, I did not observe a noticeable difference.

For the Wind Waker PR, where I've been looking at making the logic faster, I optimised the rules of 49 locations and added use of has_all to them. Doing so made the world generate slower.

Generating with 10 TWW yamls with every location enabled generates in 1.94s without this PR, but 1.49s with this PR, for a 23.3% reduction in generation duration.

Before my most recent local change that added the has_all usage in those 49 rules, those same 10 yamls would generate in 1.90s without this PR and 1.51s with this PR. So my optimisation made the multiworld generate 0.04s slower without this PR, but 0.02s faster with this PR.

Some of the games with higher numbers of text occurrences of has_any( and has_all( in *.py, generated with 10 copies of their template yaml with and without this PR:

Game has_any( text occurrences has_all( text occurrences 10 template yamls before 10 template yamls with PR Reduction
The Wind Waker (local WIP) 50 35 0.75s 0.60s -20.0%
Starcraft 2 101 38 4.6s 3.9s -15.2%
TUNIC 45 12 0.214s 0.210s -1.9%
Yoshi's Island 28 256 0.23s 0.21s - 8.7%

This is not a particularly useful count because games can define a rule function once and then reference it repeatedly, which won't be counted here, or the has_any(/has_all( rules might not be used with default options.

image image

@Exempt-Medic Exempt-Medic added the is: enhancement Issues requesting new features or pull requests implementing new features. label Jan 31, 2025
@black-sliver
Copy link
Member

black-sliver commented Feb 1, 2025

The SC2 and WW gain is now higher than I would have expected :-D
Your tables are missing used python version and architecture, by the way.

Hotness of individual rules also depends on which locations have those rules and if the seed ends up in swap or not, so actual impact is always impossible to tell just by counting occurrences.

As a small note why "dicts are pretty fast", the hash used for dict access with string keys is cached in the str object and the dict implementation in python is amazingly fast.

I have not played around with this yet, but the code looks correct and I think we should merge this asap if someone finds the time to validate the improvement and compatibility.

Copy link
Collaborator

@BadMagic100 BadMagic100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment, documenting this in code might be good, I have a feeling that 3 months down the line someone will come along and wonder why the heck we aren't using any/all/sum and try to change it back and with any luck whoever reviews it will have forgotten as well

Copy link
Collaborator

@qwint qwint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cherrypicked changes for in dev world work that heavily uses has_all_counts and it cut ~1s off the (unoptimized) ~7.5s gen time consistently
also saw similar improvements in other similar generator > for loop changes elsewhere

@black-sliver
Copy link
Member

Also did some testing, on py3.12 and py3.13, amd64, Linux, prog balancing on, spoiler and output off.
(Full spoiler with playthrough may also be impacted positively by this, but it probably adds more noise.)

For small games, improvements are easily reproducible by taking the avg of the better half of a larger number of identical gen runs. The numbers I personally looked at were close to noise, but reproducibly better.
E.g. 2 SoE + 1 Ts results in 1.2% - 1.6% faster gen on 3.13 and 3.12 even though SoE does not use any of the modified functions.

For huge multis, with the set of games I picked, the noise level is way beyond the improvement and it's impossible to run enough gens in a reasonable amount of time to de-noise it.

Copy link
Member

@black-sliver black-sliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Gonna merge now so any other performance improvements can rely on state's any/all/sum being fast.

Performance improvements are reproducible and I looked over the code again to see if the new implementations should behave the same as the old ones, and I don't see any errors.

@black-sliver black-sliver merged commit f28aff6 into ArchipelagoMW:main Feb 2, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects: core Issues/PRs that touch core and may need additional validation. is: enhancement Issues requesting new features or pull requests implementing new features. waiting-on: peer-review Issue/PR has not been reviewed by enough people yet.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants