-
Notifications
You must be signed in to change notification settings - Fork 9
Building and Running MPICH (CH3) THIS IS OUTDATED Try CH4 instructions first
This page describes how to build and run MPICH (CH3 device) to test the libfabric GNI provider. It is assumed the user is building MPICH on a Cray XC system like jupiter or edision/cori, and that you have built and installed a copy of libfabric.
MPICH can be built to use the Cray PMI or SLURM PMI. Differences in the build procedure using the two PMIs are highlighted below.
Note this wiki describes building from the stable MPICH repo, not from devel.
First, if you don't already have a clone of MPICH
% git clone git://git.mpich.org/mpich.git
Next, configure and build/install MPICH. There has to be some hacking done to get MPICH to work with Cray PMI. The goal is to build MPICH such that it uses libfabric and works with aprun or srun as the job launcher. Note you will need libtool 2.4.4 or higher to keep MPICH's configury happy.
If you wish to use the Cray PMI library, get the Cray PMI related patch file and apply it to the MPICH source
% cd mpich
% git am 0001-netmod-ofi-changes-to-be-able-to-use-cray-pmi.patch
This patch step can be skipped if the SLURM PMI library will be used. Parts of the configuration procedure are the same whether using SLURM or Cray PMI:
% ./autogen.sh
% module load PrgEnv-gnu
% export MPID_NO_PMI=yes
% export MPID_NO_PM=yes
If using Cray PMI, also set the following enviroment variable so that MPICH will be configured to use the PMI2 API.
% export USE_PMI2_API=yes
For the Cray PMI, set the LDFLAGS and LIBS environment variables as follows:
% export LDFLAGS=`pkg-config --libs-only-L cray-pmi`
% export LIBS=`pkg-config --libs-only-l cray-pmi`
For SLURM PMI on Cori, set these environment variables as follows:
% export LDFLAGS="-L/usr/lib64/slurmpmi"
% export LIBS="-lpmi"
In either case, the configure line needs to include the location the base of your libfabric install, as well as specifying the OFI nemesis netmod:
% ./configure --prefix=mpich_install_dir --with-ofi=your_libfabric_install_dir --with-device=ch3:nemesis:ofi
% make -j 8 install
Note if you are wanting to run MPI multi-threaded tests which use MPI_THREAD_MULTIPLE, you will need to configure MPICH as follows
% ./configure --prefix=mpich_install_dir --with-ofi=your_libfabric_install_dir --enable-threads=multiple --with-device=ch3:nemesis:ofi
First you will need to build an MPI app using MPICH's compiler wrapper:
% export PATH=mpich_install_dir/bin:${PATH}
% mpicc -o my_app my_app.c
On Tiger and NERSC edison/cori, the application can be launched using srun:
% srun -n 2 -N 2 ./my_app
On systems using aprun, the application can be launched:
% aprun -n 2 -N 1 /my_app
IMPORTANT NOTE: If you are running on Cori and using the SLURM PMI library, you will need to set the LD_LIBRARY_PATH (MPICH compiler scripts doesn't rpath apparently):
% export LD_LIBRARY_PATH=/usr/lib64/slurmpmi:$LD_LIBRARY_PATH
If you'd like to double check against the sockets provider, do the following
% export MPIR_CVAR_OFI_USE_PROVIDER=sockets
% srun -n 2 -N 2 ./my_app
This will force the OFI netmod to use the sockets provider.
OSU provides a relatively simple set of MPI benchmark tests which are useful for testing the GNI libfabric provider.
% wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.0.tar.gz
% tar -zxvf osu-micro-benchmarks-5.0.tar.gz
% cd osu-micro-benchmarks-5.0
% ./configure CC=mpicc
% make
In the mpi/pt2pt
and mpi/collective
subdirectories there are a number
of tests. To test, for example MPICH send/recv message latency, osu_latency
can be used
% cd mpi/pt2pt
% srun -n 2 -N 2 ./osu_latency
The MPICH CH3 OFI Netmod uses an unscalable shutdown algorithm in MPI_Finalize
. For applications with limited inter-node communication patterns (nearest neighbor, etc) this can be particularly problematic above 4K MPI processes.