Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit tests are slow #68

Closed
ZedThree opened this issue Apr 8, 2022 · 8 comments · Fixed by #107
Closed

Unit tests are slow #68

ZedThree opened this issue Apr 8, 2022 · 8 comments · Fixed by #107

Comments

@ZedThree
Copy link
Member

ZedThree commented Apr 8, 2022

The unit tests take about 30 minutes to complete, mostly it's collect. Maybe we can mock out the netcdf calls to something much faster?

@d7919
Copy link
Member

d7919 commented Apr 8, 2022

Is it possible to get the test data created in ram instead of on disk?

@johnomotani
Copy link
Contributor

johnomotani commented Apr 9, 2022

It seems like it should be possible. One possibility (?):

  • Refactor boututils.datafile.DataFile.__init__() so that it gets the implementation from a dict that we can then monkey-patch a new implementation into.
  • Within the tests, make an in-memory implementation of boututils.datafile.DataFile. Monkey patch it onto DataFile.
  • Use the in-memory version for most of the tests, just leaving a few writing to disk so we do test that NetCDF works.

Edit: Although the part I haven't figured out with this: how to pass the 'file' to collect... That could actually be a show-stopper.
Edit2: The way we got around this for xbout was to handle a list of xarray.Dataset passed to the datapath argument of open_boutdataset(). Could possibly also do this for collect by allowing a list of DataFile to be passed - not sure how much of collect()'s logic this would bypass...

@johnomotani
Copy link
Contributor

There's this package https://pypi.org/project/memory-tempfile/, which might allow doing file I/O on a RAM-disk (on Linux, seems to be possible to provide a fall-back to disk-based I/O for other OSes).

@dschwoerer
Copy link
Contributor

For me they are even slower:
====================== 591 passed in 62823.58s (17:27:03) ======================

I got it to run faster locally on my laptop, having it all on a ram disk, there it finished in 10 hours. But that might be due to some failures and not finishing:

============================= test session starts ==============================
platform linux -- Python 3.12.0, pytest-7.4.2, pluggy-1.3.0
rootdir: /tmp/boutdata
plugins: anyio-3.7.1
collected 591 items

boutdata/tests/test_boutoptions.py ................                      [  2%]
boutdata/tests/test_collect.py ......................................... [  9%]
........................................................................ [ 21%]
........................................................................ [ 34%]
........................................................................ [ 46%]
........................................................................ [ 58%]
........................................................................ [ 70%]
...............................................F..F..F..F..F..F..F..F..F [ 82%]
..F..F..
real  648m38.772s
user  499m23.204s
sys  101m39.977s

This is on Fedora rawhide, on fedora 38 they are still "fast" and finish within a reasonable time frame ...

@dschwoerer
Copy link
Contributor

dschwoerer commented Nov 13, 2023

Removing gc.collect() from squashing makes it much faster ...

@ZedThree
Copy link
Member Author

Is that there as a workaround for xarray's aggressive caching?

@dschwoerer
Copy link
Contributor

boutdata should not be using xarray?

I think it was put there as a (premature?) optimisation to minimise memory usage ...

Coming back to the original thread, I doubt moving IO to a ramdisk or similar does help, we spend 30 minutes in userspace, and only 1 minute in sys:

real	31m17.721s
user	30m35.586s
sys	1m3.874s

@dschwoerer
Copy link
Contributor

#101 helps with this, but it is still slow, in part due to having dozens of tests, where a few should probably do the job ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants