Skip to content

Commit

Permalink
Merge pull request #333 from glotzerlab/delta-openmpi-update
Browse files Browse the repository at this point in the history
Updates for latest Delta software.
  • Loading branch information
joaander authored Oct 12, 2023
2 parents 4a9911f + 716f8c3 commit 7325629
Show file tree
Hide file tree
Showing 6 changed files with 47 additions and 24 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ and optionally the day separated by periods or hyphens.
2023
----

2023-10-12
++++++++++

*Changed*

* Require openmpi/4.1.4 on Delta.
* Recommend ``export OMPI_MCA_btl=self`` and ``srun`` when launching MPI jobs on Delta.

2023-09-22
++++++++++

Expand Down
29 changes: 20 additions & 9 deletions doc/clusters/delta.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,24 +37,35 @@ container:

Serial (or multithreaded) CPU jobs (``cpu`` partition)::

module load gcc/11.2.0 openmpi/4.1.2
mpirun -n 1 -x UCX_POSIX_USE_PROC_LINK=n singularity exec --bind /scratch /scratch/<your-account>/$USER/software.sif command arguments
module load gcc/11.2.0 openmpi/4.1.4
srun -n 1 singularity exec --bind /scratch /scratch/<your-account>/$USER/software.sif command arguments

Single GPU jobs (``gpuA100x4`` and similar partitions)::

module load gcc/11.2.0 openmpi/4.1.2
mpirun -n 1 -x UCX_POSIX_USE_PROC_LINK=n singularity exec --nv --bind /scratch /scratch/<your-account>/$USER/software.sif command arguments
module load gcc/11.2.0 openmpi/4.1.4
srun -n 1 singularity exec --nv --bind /scratch /scratch/<your-account>/$USER/software.sif command arguments

MPI parallel CPU jobs (``cpu`` partition with more than 1 core)::

module load gcc/11.2.0 openmpi/4.1.2
mpirun -x UCX_POSIX_USE_PROC_LINK=n singularity exec --bind /scratch /scratch/<your-account>/$USER/software.sif command arguments
module load gcc/11.2.0 openmpi/4.1.4
export OMPI_MCA_btl=self
srun singularity exec --bind /scratch /scratch/<your-account>/$USER/software.sif command arguments

MPI parallel GPU jobs (``gpuA100x4`` and similar partitions with more than 1 GPU)::

module load gcc/11.2.0 openmpi/4.1.2
mpirun -x UCX_POSIX_USE_PROC_LINK=n singularity exec --nv --bind /scratch /scratch/<your-account>/$USER/software.sif command arguments
module load gcc/11.2.0 openmpi/4.1.4
export OMPI_MCA_btl=self
srun singularity exec --nv --bind /scratch /scratch/<your-account>/$USER/software.sif command arguments

.. note::

Setting ``OMPI_MCA_btl=self`` prevents the warning:
```
UCX WARN IB: ibv_fork_init() was disabled or failed, yet a fork() has been issued.
UCX WARN IB: data corruption might occur when using registered memory.
```
OpenMPI uses ``ucx`` for internode communication and skips ``btl`` on Delta.

.. tip::

You may use ``srun`` in place of ``mpirun`` on Delta.
You may use ``mpirun -x UCX_POSIX_USE_PROC_LINK=n`` in place of ``srun`` on Delta.
13 changes: 7 additions & 6 deletions docker/delta/test/job-cpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,22 @@
#SBATCH --job-name="test-cpu"
#SBATCH --partition=cpu
#SBATCH --nodes=2
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=1
#SBATCH --export=ALL
#SBATCH -t 0:10:00

module load gcc/11.2.0 openmpi/4.1.2
module load gcc/11.2.0 openmpi/4.1.4

set -x

export OMPI_MCA_btl=self

singularity exec software.sif bash -c "set" | grep GLOTZERLAB

mpirun -n 1 singularity exec software.sif python3 serial-cpu.py
srun -n 1 singularity exec software.sif python3 serial-cpu.py

mpirun --npernode 1 singularity exec software.sif python3 mpi-cpu.py
srun singularity exec software.sif python3 mpi-cpu.py

mpirun --npernode 1 singularity exec software.sif /opt/osu-micro-benchmarks/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bibw
srun singularity exec software.sif /opt/osu-micro-benchmarks/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bibw

echo "Tests complete."
10 changes: 6 additions & 4 deletions docker/delta/test/job-gpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,18 @@
#SBATCH --export=ALL
#SBATCH -t 0:10:00

module load gcc/11.2.0 openmpi/4.1.2
module load gcc/11.2.0 openmpi/4.1.4

set -x

export OMPI_MCA_btl=self

singularity exec software.sif bash -c "set" | grep GLOTZERLAB

singularity exec --nv software.sif python3 serial-gpu.py
srun -n 1 singularity exec --nv software.sif python3 serial-gpu.py

mpirun -v -x UCX_POSIX_USE_PROC_LINK=n singularity exec --nv software.sif python3 mpi-gpu.py
srun singularity exec --nv software.sif python3 mpi-gpu.py

mpirun -v -x UCX_POSIX_USE_PROC_LINK=n singularity exec software.sif /opt/osu-micro-benchmarks/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bibw
srun singularity exec software.sif /opt/osu-micro-benchmarks/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bibw

echo "Tests complete."
10 changes: 5 additions & 5 deletions make_dockerfiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,13 +73,13 @@ def write(fname, templates, **kwargs):
openmpi_template,
glotzerlab_software_template,
finalize_template],
FROM='nvidia/cuda:11.6.2-devel-ubuntu20.04',
FROM='nvidia/cuda:11.7.1-devel-ubuntu20.04',
system='delta',
CUDA_VERSION='11.6',
CUDA_VERSION='11.7',
OPENMPI_VERSION='4.1',
OPENMPI_PATCHLEVEL='2',
UCX_VERSION='1.11.2',
PMIX_VERSION='3.2.3',
OPENMPI_PATCHLEVEL='4',
UCX_VERSION='1.12.1',
PMIX_VERSION='3.2.5',
LIBFABRIC_VERSION='1.13.2',
ENABLE_MPI='on',
MAKEJOBS=multiprocessing.cpu_count()+2,
Expand Down
1 change: 1 addition & 0 deletions template/base.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
python3.9-dev \
python3.9-venv \
python3.9-distutils \
strace \
zlib1g-dev \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
Expand Down

0 comments on commit 7325629

Please sign in to comment.