Performance improvements #101

dschwoerer · 2023-11-13T12:47:40Z

The first commit is the most important one. For me it reduced the time from many, many hours to around 20 minutes.

2e095a2 reduces the combinatory explosion. I don't think we need for example check the different compression levels. Maybe we can remove a bit more, or less. Anyway, that should make the unit tests faster. Beyond reducing, I think we would ne some fake datafile interface, but that would probably be more effort then it is worth.

18b4d4b removes the --cov flag - which makes it slightly faster. Mostly I assume most people don't care that much about coverage. If at all, it make sense to compare coverage before and after a change, but then one actually needs to look at the output ...

I think we should merge the first commit, happy to discuss and/or revert the other two ...

dschwoerer · 2023-11-30T10:01:43Z

Just as an update, the first commit seems to be important for python3.12, it changes the run time of the tests from around 17 hours to normal times ...

johnomotani

Seems fine to me, as long as we are testing the parallelised branch of squashoutput() (see comment.

johnomotani · 2023-11-30T14:37:35Z

boutdata/tests/test_collect.py

 squash_params_list = [
    (False, {}),
    (True, {}),
-    (True, {"parallel": 2}),


If we change this, are we testing parallelised squashing at all? The parallel version was quite tricky to implement at all, so I'd be very hesitant to remove test coverage. On the other hand, to save time it'd be fine to reduce the number of test cases for it. Maybe just test the parallel version for a single-null and double-null (as the most common/important examples)?

boutdata/squashoutput.py

dschwoerer · 2024-03-08T00:55:35Z

This was introduced here:
boutproject/BOUT-dev#1241

I think I have found the issue in the mean time, for some reason some back traces from some exceptions in netCDF do not get garbage collected, and thus the gc.collect gets extremely slow as there are millions of objects.
So the original code is fine, and there is a different issue. If others are not seeing the slow-down, I am fine with keeping this as a fedora-only patch and try to find the real issue ...

ZedThree · 2024-07-25T11:10:41Z

Having a look at this again and profiling the tests, we spend a good fraction of the time in create_dump_files -- should be able to pull this out into a fixture and cache the files for each geometry. I'll have a stab at that next week

ZedThree · 2024-08-01T15:17:48Z

Changes now in #107

dschwoerer added 4 commits November 13, 2023 11:17

Make squashing faster

2c4500e

Reduce amount of combinations we test

2e095a2

Do not force --cov for pytest

18b4d4b

Remove last reference of gc

1fb9a62

dschwoerer mentioned this pull request Nov 13, 2023

Unit tests are slow #68

Closed

dschwoerer requested a review from johnomotani November 30, 2023 10:01

johnomotani approved these changes Nov 30, 2023

View reviewed changes

bendudson reviewed Dec 20, 2023

View reviewed changes

boutdata/squashoutput.py Show resolved Hide resolved

ZedThree mentioned this pull request Jul 31, 2024

Performance improvements and collect test improvements #107

Merged

ZedThree closed this Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #101

Performance improvements #101

dschwoerer commented Nov 13, 2023

dschwoerer commented Nov 30, 2023

johnomotani left a comment

johnomotani Nov 30, 2023

dschwoerer commented Mar 8, 2024

ZedThree commented Jul 25, 2024

ZedThree commented Aug 1, 2024

Performance improvements #101

Performance improvements #101

Conversation

dschwoerer commented Nov 13, 2023

dschwoerer commented Nov 30, 2023

johnomotani left a comment

Choose a reason for hiding this comment

johnomotani Nov 30, 2023

Choose a reason for hiding this comment

dschwoerer commented Mar 8, 2024

ZedThree commented Jul 25, 2024

ZedThree commented Aug 1, 2024