-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #101
Conversation
Just as an update, the first commit seems to be important for python3.12, it changes the run time of the tests from around 17 hours to normal times ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine to me, as long as we are testing the parallelised branch of squashoutput()
(see comment.
squash_params_list = [ | ||
(False, {}), | ||
(True, {}), | ||
(True, {"parallel": 2}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change this, are we testing parallelised squashing at all? The parallel version was quite tricky to implement at all, so I'd be very hesitant to remove test coverage. On the other hand, to save time it'd be fine to reduce the number of test cases for it. Maybe just test the parallel version for a single-null and double-null (as the most common/important examples)?
This was introduced here: I think I have found the issue in the mean time, for some reason some back traces from some exceptions in netCDF do not get garbage collected, and thus the gc.collect gets extremely slow as there are millions of objects. |
Having a look at this again and profiling the tests, we spend a good fraction of the time in |
Changes now in #107 |
The first commit is the most important one. For me it reduced the time from many, many hours to around 20 minutes.
2e095a2 reduces the combinatory explosion. I don't think we need for example check the different compression levels. Maybe we can remove a bit more, or less. Anyway, that should make the unit tests faster. Beyond reducing, I think we would ne some fake datafile interface, but that would probably be more effort then it is worth.
18b4d4b removes the
--cov
flag - which makes it slightly faster. Mostly I assume most people don't care that much about coverage. If at all, it make sense to compare coverage before and after a change, but then one actually needs to look at the output ...I think we should merge the first commit, happy to discuss and/or revert the other two ...