CUDA

Using the system CUDA runtime library

The CUDA runtime library is quite a large download: using the library installed on the cluster can save significant overhead. For CUDA.jl 4 or later, you set the CUDA_Runtime_jll.jl preferences to version = "local". See Julia - Preferences.

CUDA-aware MPI

Configuration

Use the following modules:

cuda/11.2 ucx/1.13.1_cuda-11.2 openmpi/4.1.5_cuda-11.2

In addition, you may need to set the following environment variables:

env:
  JULIA_CUDA_MEMORY_POOL: none
  OMPI_MCA_opal_warn_on_missing_libcuda: 0

the first disables the CUDA.jl memory pool: see MPI.jl known issues.
the second prevents a warning from being displayed if CUDA is not available (e.g. if you're using MPI on a regular CPU node).

Check that it is using GPU-to-GPU direct communication

Look at profile, make sure it is not using DtoH/HtoD memory operations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA

Using the system CUDA runtime library

CUDA-aware MPI

Configuration

Check that it is using GPU-to-GPU direct communication

Clone this wiki locally