Skip to content

Running fabtests with the GNI provider

Sung-Eun Choi edited this page Sep 9, 2016 · 3 revisions

The fabtests suite is a collection of simple examples for using libfabric. We run these for every PR merge via our CI (jenkins) testing. The tests are written and run using a client-server view and require a little more work to launch on a Cray system.

The fabtests suite includes a script called runfabtests.sh. To run this script out of the box, we have to use Cray's Cluster Compatibility Mode (CCM). You can also run the individual tests by hand.

CCM launch

We've written a script in our fab-utils repository that wraps runfabtests.sh. The script, called ccm_runfabtests.sh, load and sets up CCM mode and then calls runfabtests.sh with the appropriate agruments. Depending on your system, you may to get a node allocation before running the script. Note that this only works on systems that have CCM available, which currently does not include any systems running native SLURM.

Manual

For either type of system, you can obtain the IP addresses of a set of nodes in your allocation (obtained via qsub or salloc) and then pass these to the individual tests. There are a number of ways to do this, but the easiest is probably to run /sbin/ifconfig on your nodes, and look for the interface called ipogif0.

On a SLURM system, create a file with contents similar to the following:

0 ./fi_pingpong -f gni -e rdm
1 ./fi_pingpong -f gni -e rdm 10.128.0.13

where fi_pingpong is the name of the fabtest you are trying to run, and 10.128.0.13 is the IP address of the one of the nodes in your allocation. Then use srun's multi-prog launch option with this script:

% srun -n2 -N2 --multi-prog ./my_multi_prog_file

Here's an example of a simple script to run a single client-server fabtest on single node (test.sh):

#!/bin/bash

TEST=$1

echo "running $TEST on $SLURM_NODELIST"

export IP=`srun -N1 /sbin/ifconfig ipogif0 | grep "inet addr" | awk -F: '{ print $2}' | awk '{print $1}'`
echo $IP

cat <<EOT >> scalable.conf
0      $TEST -f gni -s $IP
1      $TEST -f gni $IP
EOT

srun -N1 --ntasks-per-node=2 -l --multi-prog scalable.conf
rm scalable.conf

Then to run, say, the scalable_ep test:

~/fabtests/bin $ salloc -N1
~/fabtests/bin $ ./test.sh ./fi_scalable_ep
~/fabtests/bin $ exit