Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make setup async to get rid of setup() #167

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

olsaarik
Copy link
Contributor

@olsaarik olsaarik commented Aug 31, 2023

This PR changes the Bootstrap API to make recv calls asynchronous (they return a std::future<void>) and implements this change in TcpBootstrap by moving receives to background threads. For each pair of source rank and tag there is a thread that performs the receives in the order they were issued.

recv is marked as [[nodiscard]] to avoid ignoring the asyncness of the function. However, I didn't see a warning from that in my build and I'm not sure why.

The NonblockingFuture class is now gone in favor of std::future everywhere. BaseSemaphore has a std::shared_future because there are multiple calls to get in its current form and std::future turns invalid after the first call to get. It might make sense to redesign the semaphores API to return semaphores in a std::future, so that after initialization the .get() indirection is avoided.

Alternatively, we could make everything std::shared_future to make the APIs overall easier to use, although that might feel a bit prescriptive. std::future can be a bit surprising especially on the Python side where calling .get() multiple times doesn't feel like it should be an error.

As an unrelated change this PR renames getRank to rank and getNRanks to size to conform to C++ and MPI naming.

Finally, instead of fixing the existing Python examples I moved the bootstrap example into a Jupyter notebook in the docs/ directory. The other one didn't seem like it's worth keeping anymore since we have the Python tests now.

Base automatically changed from olli/conn-wo-boot to main September 6, 2023 05:10
@olsaarik olsaarik marked this pull request as ready for review September 12, 2023 21:36
Copy link
Contributor

@chhwang chhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Contributor

@Binyang2014 Binyang2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@saeedmaleki saeedmaleki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great but is there any potential for slowdowns?

@chhwang
Copy link
Contributor

chhwang commented Oct 8, 2023

@olsaarik I have made a few updates to remove setup() in Python tests: 1773207, but some tests such as test_connection_write are still failing. Could you please have a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants