Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-blocking PUT in CHPL_COMM=ofi #25977

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from
Draft

Non-blocking PUT in CHPL_COMM=ofi #25977

wants to merge 16 commits into from

Conversation

jhh67
Copy link
Contributor

@jhh67 jhh67 commented Sep 23, 2024

TODO: flesh this out

Proper implementation of non-blocking PUT in CHPL_COMM=ofi.

Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Previously, non-blocking PUTs were implemented via blocking PUTs, which could
severely limit performance. Prior to 2.0, small PUTs invoked fi_inject_write,
which essentially turned them into non-blocking PUTs, but chpl_comm_put
returned as if the PUT was completed. This could cause MCM violations as well
as hangs caused by not progressing the network stack properly. These
deficiences were fixed in 2.0, but led to a performance regression. This
commit implements non-blocking PUTs correctly, so that the chpl_comm_*nb*
functions work correctly. This should restore 1.32.0 performance while
avoiding MCM violations and hangs.

Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Rewrote PUT logic so that low-level functions are non-blocking, and a blocking
PUT is implemented by initiating a non-blocking PUT and waiting for it to
complete. This simplifies the implementation and avoids code duplication.

Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Allow specifying the maximum message size and maximum number of endpoings.
These are intended primarily for testing.

Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Also some code cleanup.

Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
We are now using this function to force visibility when an unbound endpoint is
released, so it needs to work on unbound endpoints.

Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Operations to force visibility are deferred until the endpoint is released,
which requires the visibility bitmaps.

Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Fixed how the number of transmit contexts needed is computed, and added some
comments.

Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Change type of numTxCtxs and numRxCtxs to size_t to match type of
info->domain_attr->ep_cnt.

Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant