Aluminum provides a generic interface to high-performance communication libraries, with a focus on allreduce algorithms. Blocking and non-blocking algorithms and GPU-aware algorithms are supported. Aluminum also contains custom implementations of select algorithms to optimize for certain situations.
- Blocking and non-blocking algorithms
- GPU-aware algorithms
- Implementations/interfaces:
MPI
: MPI and custom algorithms implemented on top of MPINCCL
: Interface to Nvidia's NCCL 2 libraryMPI-CUDA
: Custom GPU-aware algorithms
- A compiler that supports at least C++11
- MPI (at least MPI 3.0)
- CUDA (at least 9.0, optional if no GPU support is needed)
- NCCL2 (optional if no NCCL support is needed)
CMake 3.9 or newer is required. An out-of-source build is required:
mkdir build && cd build
cmake <options> /path/to/aluminum/source
The required packages are MPI
, OpenMP
, and HWLOC
. MPI
and
OpenMP
use the standard CMake packages and can be manipulated in the
standard way. HWLOC
, if installed in a nonstandard location, may
require HWLOC_DIR
to be set to the appropriate installation prefix.
The CUDA
-based backends assume CUDA
is a first-class language in
CMake. An alternative CUDA
compiler can be selected using
-DCMAKE_CUDA_COMPILER=/path/to/my/nvcc
If the NCCL
backend is used, the NCCL_DIR
variable may be
used to point CMake to a nonstandard installation prefix.
For the NCCL
backend:
-DALUMINUM_ENABLE_NCCL=ON
For the MPI-CUDA
backend:
-DALUMINUM_ENABLE_MPI_CUDA=ON
The NCCL
and MPI-CUDA
backends can be combined.
Here is a complete example:
CMAKE_PREFIX_PATH=/path/to/your/MPI:$CMAKE_PREFIX_PATH cmake -D ALUMINUM_ENABLE_NCCL=YES -D ALUMINUM_ENABLE_MPI_CUDA=YES -D NCCL_DIR=/path/to/NCCL ..
The test_correctness
binary will check the correctness of Aluminum's allreduce implementations. The usage is
test_correctness [Al backend: MPI, NCCL, MPI-CUDA]
For example, to test the MPI
backend:
mpirun -n 128 ./test_correctness
To test the NCCL
backend, instead:
mpirun -n 128 ./test_correctness NCCL
The benchmark_allreduce
benchmark can be run similarly, and will report runtimes for different allreduce algorithms.
Coming soon...
- Nikoli Dryden
- Naoya Maruyama
- Andy Yoo
- Tom Benson
See also contributors.
Aluminum is licensed under the Apache License 2.0. See LICENSE for details.