Skip to content

BUG: Re-evaluate BatchSimulate backends #955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks
asoplata opened this issue Nov 26, 2024 · 6 comments · May be fixed by #1009
Open
4 tasks

BUG: Re-evaluate BatchSimulate backends #955

asoplata opened this issue Nov 26, 2024 · 6 comments · May be fixed by #1009
Labels
parallelization Parallel processing testing
Milestone

Comments

@asoplata
Copy link
Collaborator

asoplata commented Nov 26, 2024

Update:

It turns out that BatchSimulate does not currently work in its entirety, for most of its listed available backends. This situation needs to be addressed before we improve BatchSimulate's testing (the original point of this issue). On my machine, if I execute our BatchSimulate example ( https://github.com/jonescompneurolab/hnn-core/blob/master/examples/howto/plot_batch_simulate.py ) but change the backends, I get the following:

  • backend='loky': Succeeds. The default, and the only backend that works out-of-the-box.
  • backend='dask': Partially succeeds, but needs considerable code changes in order to do so (which are currently undocumented). The usage in this PR ( https://github.com/jonescompneurolab/hnn-core/pull/1009/files#diff-257f4367bd17cccc84c603209300a970cc3f42f629fb2e3048e7d88e4afaf6fbR357-R384 ) works for me, but:
    1. it requires a pip install "dask[distributed]" because we have no setup.py recipe for installing Dask,
    2. it requires a different usage of the API (calling different functions) than in our Example script, and
    3. it requires wrapping the actual simulation call with a setup Dask.Client object.
  • backend='multiprocessing': Fails. I get an endless series of RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase and it does not appear to start the simulations at all. You can easily test this one by running ( https://github.com/jonescompneurolab/hnn-core/blob/master/examples/howto/plot_batch_simulate.py ) and merely changing the backend.
  • backend='threading': Fails. At least one simulation is successfully begun, and I'm able to see some Trial 1: 30.0 ms... etc. output, but the program quickly crashes with a segfault. I'm unfamiliar with the threading module of the standard-library, including whether it depends on hardware-threading, which I would assume it does not. I think use of this library should still be usable on my Apple M4 chip (which does not support simultaneous multi-threading aka hardware-threading, as opposed to software-threading), but I could be wrong about that.

In the meantime, it may be prudent to change the documentation for BatchSimulate such that it indicates that only loky is currently fully supported. We could then re-add documentation about the other backends in future versions once we know they work and test. This is relevant for our public API, which BatchSimulate also needs to be added to, which it wasn't when it was added: it is listed in our Examples ( https://jonescompneurolab.github.io/hnn-core/stable/auto_examples/howto/plot_batch_simulate.html#sphx-glr-auto-examples-howto-plot-batch-simulate-py ) but currently NOT listed in our public API ( https://jonescompneurolab.github.io/hnn-core/stable/api.html ).

Original Issue follows:


Currently, testing of BatchSimulate is very incomplete.

  • Importantly, as @ntolley pointed out here ENH: Refactor BatchSimulate Example and Improve Documentation #857 (comment) , we need a way to test that the loky backend not only creates multiple workers, but actually uses the multiple parallel workers, in parallel. I think we may have to actually use loky directly in our tests for this (see ENH: Refactor BatchSimulate Example and Improve Documentation #857 (comment) ).

  • The threading backend only has a single test, the multiprocessing and dask backends are not tested at all.

  • Much of the testing is parameter TypeError testing, which is not testing BatchSimulate's functionality so much as it is testing _validate_type and _check_option. I hate to say it, but I think many of these tests can be removed. Relatedly, I will make an issue for introducing typing and type-checking for the codebase as a whole. I'm fine with leaving these tests in until we get more formal type-checking implemented, however.

  • Relatedly, we should consider dropping the 'multiprocessing' backend as an option, since it has quirks in Joblib and, judging from joblib's documentation, is effectively replaced by loky (see https://joblib.readthedocs.io/en/stable/parallel.html#old-multiprocessing-backend ).

@samadpls
Copy link
Contributor

Hi @asoplata , I agree with everything you've mentioned regarding the testing and using loky. I didn't add the Dask backend for testing, as this repo doesn't currently include it as a dependency, but if needed, it can be installed separately

@asoplata
Copy link
Collaborator Author

It's arguably hacky, but we could also try using psutil to track processes with specific names in our testing, ensuring that the appropriate number are created, etc.

@asoplata
Copy link
Collaborator Author

Hi @asoplata , I agree with everything you've mentioned regarding the testing and using loky. I didn't add the Dask backend for testing, as this repo doesn't currently include it as a dependency, but if needed, it can be installed separately

Thanks @samadpls , yes most of these BatchSimulation test refactors are TODO tasks that existed before your PR. We definitely appreciate your contributions!

@samadpls
Copy link
Contributor

Thank you for the clarification! I will ensure it is useful and maintainable for future development. If you need any changes, please let me know. I'm happy to improve the structure and functionality.

@asoplata asoplata added the good first issue Good for newcomers label Jan 14, 2025
@Freedisch
Copy link

Freedisch commented Mar 5, 2025

This sound like a good issue to start with, I'll look into

@Freedisch Freedisch linked a pull request Mar 12, 2025 that will close this issue
@asoplata asoplata removed the good first issue Good for newcomers label Mar 14, 2025
@asoplata asoplata changed the title TEST: Improve BatchSimulate testing BUG: Re-evaluate BatchSimulate backends Mar 14, 2025
@asoplata
Copy link
Collaborator Author

asoplata commented Mar 14, 2025

@ntolley @dylansdaniels I've updated the body of this issue to reflect that there are currently undocumented issues and bugs with the BatchSimulate backends. We should discuss this on Monday, including changing the current BatchSimulate documentation such that, for the time being, only the loky backend is mentioned, since it's the only one that works out of the box and is tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelization Parallel processing testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants