Skip to content

Testing and Benchmarking

Amedeo Sapio edited this page Aug 7, 2024 · 3 revisions

Unit tests

Basic unit tests can be executed via the following command:

$ make check

which executes the unit test suite on the same environment where the make command was executed and reports a summary when done:

PASS: deque
PASS: freelist
PASS: msgbuff
PASS: show_tuner_decisions
PASS: scheduler
PASS: idpool
PASS: ep_addr_list
PASS: mr
============================================================================
Testsuite summary for aws-ofi-nccl GitHub-dev
============================================================================
# TOTAL: 8
# PASS:  8
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================

Functional tests

Running plugin functional tests require a working MPI installation and a MPI setup between the communicating hosts. To install MPI, you can use standard packages provided for your linux distribution. Once MPI is setup, you can use commands like below for running any test of your choice.

mpirun -n 2 --host <host-1>,<host-2> $INSTALL_PREFIX/bin/nccl_message_transfer

Note: All tests require exactly 2 MPI ranks to run except ring.c

Benchmarking with nccl-tests

To run collective benchmark tests with the aws-ofi-nccl plugin, you can follow the instructions below.

  1. Clone the repository
git clone https://github.com/NVIDIA/nccl-tests.git
  1. Build the tests
cd  nccl-tests/
make MPI=1 MPI_HOME=/path/to/mpi CUDA_HOME=/path/to/cuda NCCL_HOME=/path/to/nccl
  1. Run perf tests
NCCL_DEBUG=INFO mpirun -np 2 --bind-to none build/all_reduce_perf -b 8 -f 2 -e 32M -c 1 -g 1

If you installed the AWS libfabric plugin in a custom prefix, ensure LD_LIBRARY_PATH is set to include that prefix so the perf test binaries can find the plugin.

Clone this wiki locally