-
Notifications
You must be signed in to change notification settings - Fork 9
Running fabtests with the GNI provider
The fabtests suite is a collection of simple examples for using libfabric. We run these for every PR merge via our CI (jenkins) testing. The tests are written and run using a client-server view and require a little more work to launch on a Cray system.
The fabtests suite includes a script called runfabtests.sh
. To run this script out of the box, we have to use Cray's Cluster Compatibility Mode (CCM). You can also run the individual tests by hand.
We've written a script in our fab-utils repository that wraps runfabtests.sh
. The script, called ccm_runfabtests.sh
, load and sets up CCM mode and then calls runfabtests.sh
with the appropriate agruments. Depending on your system, you may to get a node allocation before running the script. Note that this only works on systems that have CCM available, which currently does not include any systems running native SLURM.
For either type of system, you can obtain the IP addresses of a set of nodes in your allocation (obtained via qsub or salloc) and then pass these to the individual tests. There are a number of ways to do this, but the easiest is probably to run /sbin/ifconfig
on your nodes, and look for the interface called ipogif0
.
On a SLURM system, create a file with contents similar to the following:
0 ./fi_pingpong -f gni -e rdm
1 ./fi_pingpong -f gni -e rdm 10.128.0.13
where fi_pingpong
is the name of the fabtest you are trying to run, and 10.128.0.13 is the IP address of the one of the nodes in your allocation. Then use srun's multi-prog launch option with this script:
% srun -n2 -N2 --multi-prog ./my_multi_prog_file
Example running scalable_ep: test.sh
#!/bin/bash
TEST=$1
echo "running $TEST on $SLURM_NODELIST"
export IP=`srun -N1 /sbin/ifconfig ipogif0 | grep "inet addr" | awk -F: '{ print $2}' | awk '{print $1}'`
echo $IP
cat <<EOT >> scalable.conf
0 $TEST -f gni -s $IP
1 $TEST -f gni $IP
EOT
srun -N1 --ntasks-per-node=2 -l --multi-prog scalable.conf
rm scalable.conf
~/fabtests/bin $ salloc -N1
~/fabtests/bin $ ./test.sh ./fi_scalable_ep
~/fabtests/bin $ exit