-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add DMA-BUF support #618
Merged
Merged
Commits on Sep 25, 2024
-
use cudart static instead of dlopen'ing libcuda.so check at build time for at least cuda 11.7, and if present, enable build-time support for the CUDA side of DMA-BUF detection. Also check for CUDA 11.3 at build time for gdr flush attribute support. Prefer cudart functions where possible for simplicity. Drop cuda_check functional test as its intention was to ensure that CUDA was not linked, but CUDA is now linked when CUDA is enabled. This also needed to fix nvtx's m4 autodetection, and our github actions needed to use the actual cuda repos (previously were using ancient 11.x toolkits from ubuntu universe repos). Signed-off-by: Nicholas Sielicki <nslick@amazon.com>
Configuration menu - View commit details
-
Copy full SHA for a1e2188 - Browse repository at this point
Copy the full SHA a1e2188View commit details -
refactor(api): delete previous DMA-BUF stub support
Along the interface with NCCL, continue to return -ENOTSUP, but for the internal api remove dmabuf fnptrs within the communicators and delete the impls they pointed at. Signed-off-by: Nicholas Sielicki <nslick@amazon.com>
Configuration menu - View commit details
-
Copy full SHA for a906c5f - Browse repository at this point
Copy the full SHA a906c5fView commit details -
refactor(mr): add tagged union as mr cache key
The MR cache needs to be capable of handling queries of triplets of {base, offset, len} in addition to the current arguments of {base, len}. Add a tagged union with a functional interface that can represent this generically. Tagged union members are struct iovec and struct fi_mr_dmabuf, which matches the union within struct fi_mr_attr; Signed-off-by: Nicholas Sielicki <nslick@amazon.com>
Configuration menu - View commit details
-
Copy full SHA for f7df56c - Browse repository at this point
Copy the full SHA f7df56cView commit details -
refactor(api): replace addr+len pairs with cachekeys
Immediately on any top-level NCCL call, construct an immutable nccl_ofi_mr_cachekey_t on the stack, then pass that to communicator regmr implementations. Add a flags argument to internal regmr functions such that the input can be inspected and may add FI_MR_DMABUF if the input arguments correspond to a file descriptor. Implement top-level nccl_net_ofi_regMr in terms of nccl_net_ofi_regMrDmaBuf, simply forwarding arguments alongside an invalid file descriptor (-1) and a zero offset. DMA-BUF remains unsupported as of this commit, but only due to not advertising support back to NCCL/nccom. Signed-off-by: Nicholas Sielicki <nslick@amazon.com>
Configuration menu - View commit details
-
Copy full SHA for 4255abb - Browse repository at this point
Copy the full SHA 4255abbView commit details -
refactor(util): cleanup version tracking
The info has all the version information we need, so don't store it globally and don't pass it explicitly to functions that already are taking in an info. Signed-off-by: Nicholas Sielicki <nslick@amazon.com>
Configuration menu - View commit details
-
Copy full SHA for b548268 - Browse repository at this point
Copy the full SHA b548268View commit details -
This adds dmabuf support to ofi-nccl, with the following requirements: * ofi-nccl must be built against libfabric >=1.20 * libfabric must be built against libibverbs >=34 * FI_HMEM must be supported by the provider. * linux 5.12+ is required for rdma dmabuf import ioctls. The plugin will automatically disable dmabuf support when running under older kernels. NCCL_DEBUG=TRACE will report this condition. * Accelerators: CUDA: 1. CUDA Toolkit >=11.7 must be available at build time of ofi-nccl 2. open-source nvidia drivers >=515 must be used at runtime. 3. CU_DEVICE_ATTRIBUTE_DMA_BUF_SUPPORTED must return true at runtime. Neuron: no specific NRT version check is performed at build time or at runtime. libfabric presently provides no hints at initialization that allow ofi-nccl to differentiate between a provider that has FI_HMEM support and one that has dmabuf support. In the case that the plugin is built against libfabric >=1.20 and all other conditions for dmabuf are met, the plugin will optimistically assume support by the provider and may provide dmabufs for memory registration. For providers where this fails OFI_NCCL_DISABLE_DMABUF=1 may be set to force the legacy path. When set, dmabuf support is not advertised to NCCL and this ensures that the plugin remains in the legacy path. Testing: Various combinations of + OFI_NCCL_DISABLE_DMABUF=0/1 + OFI_NCCL_PROTOCOL=RDMA/SENDRECV + FI_HMEM_CUDA_USE_GDRCOPY=0/1 Signed-off-by: Nicholas Sielicki <nslick@amazon.com>
Configuration menu - View commit details
-
Copy full SHA for 5dcaa87 - Browse repository at this point
Copy the full SHA 5dcaa87View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.