-
Notifications
You must be signed in to change notification settings - Fork 777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Replace generator creation/iteration in CollectionState methods #4587
Core: Replace generator creation/iteration in CollectionState methods #4587
Conversation
Using generators in these functions incurs overhead to create the new generator instance, call the `any`/`all`/`sum` function and have the `any`/`all`/`sum` function iterate the generator, which in turn iterates the iterable. Replacing the use of generators with for loops is faster. Getting `self.prog_items[player]` once in advance also improves performance of iterating longer iterables.
if player_prog_items[item_name] > 0: | ||
total += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation could go either way with
for item_name in items:
total += (player_prog_items[item_name] > 0)
When you posted the other result i already had the feeling that we can optimize this. I did however expect either a bigger or a smaller gain. Bigger because dicts are pretty fast, smaller because the optimizer could in theory do this (but seems like it doesn't). |
I think it is likely that the reason there isn't much visible change here is that the average world's performance is more affected by other areas, maybe simply that these functions are not used enough for this PR to make much of a difference on average. I did not make a PR for these changes earlier because when I tried generations with 1 of every supported game template yaml in a single multiworld, I did not observe a noticeable difference. For the Wind Waker PR, where I've been looking at making the logic faster, I optimised the rules of 49 locations and added use of Generating with 10 TWW yamls with every location enabled generates in 1.94s without this PR, but 1.49s with this PR, for a 23.3% reduction in generation duration. Before my most recent local change that added the Some of the games with higher numbers of text occurrences of
This is not a particularly useful count because games can define a rule function once and then reference it repeatedly, which won't be counted here, or the |
The SC2 and WW gain is now higher than I would have expected :-D Hotness of individual rules also depends on which locations have those rules and if the seed ends up in swap or not, so actual impact is always impossible to tell just by counting occurrences. As a small note why "dicts are pretty fast", the hash used for dict access with string keys is cached in the str object and the dict implementation in python is amazingly fast. I have not played around with this yet, but the code looks correct and I think we should merge this asap if someone finds the time to validate the improvement and compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment, documenting this in code might be good, I have a feeling that 3 months down the line someone will come along and wonder why the heck we aren't using any/all/sum and try to change it back and with any luck whoever reviews it will have forgotten as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cherrypicked changes for in dev world work that heavily uses has_all_counts and it cut ~1s off the (unoptimized) ~7.5s gen time consistently
also saw similar improvements in other similar generator > for loop changes elsewhere
Also did some testing, on py3.12 and py3.13, amd64, Linux, prog balancing on, spoiler and output off. For small games, improvements are easily reproducible by taking the avg of the better half of a larger number of identical gen runs. The numbers I personally looked at were close to noise, but reproducibly better. For huge multis, with the set of games I picked, the noise level is way beyond the improvement and it's impossible to run enough gens in a reasonable amount of time to de-noise it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Gonna merge now so any other performance improvements can rely on state's any/all/sum being fast.
Performance improvements are reproducible and I looked over the code again to see if the new implementations should behave the same as the old ones, and I don't see any errors.
What is this fixing or adding?
Using generators in these functions incurs overhead to create the new generator instance, call the
any
/all
/sum
function and have theany
/all
/sum
function iterate the generator, which in turn iterates the iterable.Replacing the use of generators with for loops is faster.
Getting
self.prog_items[player]
once in advance also improves performance of iterating longer iterables.How was this tested?
I generated multiworlds with template yamls from A Hat in Time to Meritous alphabetically,
--skip_output --seed 1
and progression balancing. I ran 10 generations before this PR and 10 generations after this PR and averaged the results in each case:The placement of all items was observed to be identical before and after this PR with the same seed.
I used
timeit
to pick implementations. From usingtimeit
, it is clear that most implementations not using generators are faster for both large and small iterables.The
has_
implementations are rather simple and there are not many alternatives with similar performance.The
count_
implementations have more possibilities because some possible implementations can be better/worse with more/fewer of the counted items being in the state.The
count_
implementations I picked performed best for me on Python 3.12 up to at least 6-element iterables. For larger iterables, there are usually options that scale slightly better. The precise performance and scaling of similar options will probably change between Python versions, so it is probably not too important which implementation is picked when they perform very similarly.While looking at logic performance for #4458, I pre-calculated some logic and updated some functions to use
has_all()
instead of chaining 2 or 3has()
calls together. Without this PR, making that change actually made the world take longer to generate (thehas_all()
calls lost more time than the time saved by pre-calculating some of the logic). With this PR, the changes make the world generate faster as I had expected.