Aluminum

Aluminum provides a generic interface to high-performance communication libraries, with a focus on allreduce algorithms. Blocking and non-blocking algorithms and GPU-aware algorithms are supported. Aluminum also contains custom implementations of select algorithms to optimize for certain situations.

Features

Blocking and non-blocking algorithms
GPU-aware algorithms
Implementations/interfaces:
- MPI: MPI and custom algorithms implemented on top of MPI
- NCCL: Interface to Nvidia's NCCL 2 library
- MPI-CUDA: Custom GPU-aware algorithms

Getting started

Prerequisites

A compiler that supports at least C++11
MPI (at least MPI 3.0)
CUDA (at least 9.0, optional if no GPU support is needed)
NCCL2 (optional if no NCCL support is needed)

Building

CMake 3.9 or newer is required. An out-of-source build is required:

mkdir build && cd build
cmake <options> /path/to/aluminum/source

The required packages are MPI, OpenMP, and HWLOC. MPI and OpenMP use the standard CMake packages and can be manipulated in the standard way. HWLOC, if installed in a nonstandard location, may require HWLOC_DIR to be set to the appropriate installation prefix.

The CUDA-based backends assume CUDA is a first-class language in CMake. An alternative CUDA compiler can be selected using

-DCMAKE_CUDA_COMPILER=/path/to/my/nvcc

If the NCCL backend is used, the NCCL_DIR variable may be used to point CMake to a nonstandard installation prefix.

For the NCCL backend:

-DALUMINUM_ENABLE_NCCL=ON

For the MPI-CUDA backend:

-DALUMINUM_ENABLE_MPI_CUDA=ON

The NCCL and MPI-CUDA backends can be combined.

Here is a complete example:

CMAKE_PREFIX_PATH=/path/to/your/MPI:$CMAKE_PREFIX_PATH cmake -D ALUMINUM_ENABLE_NCCL=YES -D ALUMINUM_ENABLE_MPI_CUDA=YES -D NCCL_DIR=/path/to/NCCL ..

Tests and benchmarks

The test_correctness binary will check the correctness of Aluminum's allreduce implementations. The usage is

test_correctness [Al backend: MPI, NCCL, MPI-CUDA]

For example, to test the MPI backend:

mpirun -n 128 ./test_correctness

To test the NCCL backend, instead:

mpirun -n 128 ./test_correctness NCCL

The benchmark_allreduce benchmark can be run similarly, and will report runtimes for different allreduce algorithms.

API overview

Coming soon...

Authors

Nikoli Dryden
Naoya Maruyama
Andy Yoo
Tom Benson

License

Aluminum is licensed under the Apache License 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 437 Commits
benchmark		benchmark
cmake		cmake
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CONTRIBUTORS		CONTRIBUTORS
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aluminum

Features

Getting started

Prerequisites

Building

Tests and benchmarks

API overview

Authors

License

About

Releases

Packages

Languages

License

benson31/Aluminum

Folders and files

Latest commit

History

Repository files navigation

Aluminum

Features

Getting started

Prerequisites

Building

Tests and benchmarks

API overview

Authors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages