Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable parallel testing using pytest-xdist #190

Merged
merged 8 commits into from
Dec 19, 2023

Conversation

CunliangGeng
Copy link
Member

@CunliangGeng CunliangGeng commented Nov 24, 2023

This PR adds the pytest-xdist to enable parallel testing to speed up the testing.

To make parallel testing work, some session-scoped pytest fixtures that generate data or files used by multiple test files were updated (see this example).

Results:

  • sequential testing: 380s
  • parallel testing (8cores): 85s

Using pytest --durations=0 can detect the most time consuming tests. It shows that tests on downloaders are most expensive, and ignoring them can reduce testing time from 85s to 15s!

Copy link
Contributor

@gcroci2 gcroci2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for me to understand, reading from the docs, pytest-xdist can be used in cases a fixture needs to execute exactly once per test session, for example specifying scope="session" in the fixture and by reading from a FileLock to produce the fixture data only once. I don't see here any .lock file though. So I wonder how the tests can run in parallel with the current edits.

tests/conftest.py Show resolved Hide resolved
@CunliangGeng
Copy link
Member Author

CunliangGeng commented Dec 14, 2023

Just for me to understand, reading from the docs, pytest-xdist can be used in cases a fixture needs to execute exactly once per test session, for example specifying scope="session" in the fixture and by reading from a FileLock to produce the fixture data only once. I don't see here any .lock file though. So I wonder how the tests can run in parallel with the current edits.

We solved the issue in a different way. See the example below:

We used scope="session", it cannot make sure that the data is generated once, but it ensure that the data is generated once in one process (or one cpu core).
We also used tmp_path_factory (it's a session-level fixture provided by pytest, unlike the function-level tmp_path). It will make sure the data generated in different sessions will be stored in different temporary paths.

So for the example below,

  1. If the data is generated once for all processes, very good then
  2. if the data is generated more than once for all processes, then our setting can make sure the different processes have their own copy of data in a different temporary path. Say, process 1's data is in path1, while process 2's data is in path2, making sure process 1 and process 2 are independent tests.
class TestMibigBGCLoader:
    @pytest.fixture(scope="session") 
    def data_dir(self, tmp_path_factory): 
        # get the temp directory shared by all workers
        download_root = tmp_path_factory.mktemp("download")
        extract_path = tmp_path_factory.mktemp("metadata")
        download_and_extract_mibig_metadata(download_root, extract_path)
        yield str(extract_path)

Base automatically changed from 11-22-refactor_GCF to dev December 19, 2023 13:17
Copy link
Member Author

CunliangGeng commented Dec 19, 2023

Merge activity

@CunliangGeng CunliangGeng merged commit 095dcec into dev Dec 19, 2023
4 of 6 checks passed
@CunliangGeng CunliangGeng deleted the 11-22-enable_pytest_xdist branch December 19, 2023 13:25
@hechth
Copy link
Collaborator

hechth commented Jan 17, 2024

@florian-huber we could consider using this also for matchms to speed up test execution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants