-
Notifications
You must be signed in to change notification settings - Fork 100
QUDA Build With CMake
Starting with version 0.9, QUDA must be built using cmake. To build QUDA with CMake you need to have cmake 3.15 (QUDA or later for version 1.1, and 3.18 or later with the current develop. Try
cmake --version
to make sure you have cmake and your version is recent enough. If you do not have a CMake
on your system, please follow the following instructions, else skip to the Building QUDA using CMake
section.
For multi-GPU builds with OpenMPI, we recommend using at least version 4.0.x, and compiling OpenMPI with a recent version of UCX (at least 1.10) for CUDA-aware MPI support.
You are likely going to build QUDA on a remote machine with a module system. Try
module avail cmake
to see if the module loader has a CMake
option. If it does not have a CMake
module loaded, please ask the system administrator to add the module. In the meantime, you can download the source code here. Once you've gone through the build steps of CMake
, prepend your PATH
so that your environment can access the binaries.
It is recommend to build QUDA in a separate folder (out-of_source). This has the advantage that you don't need to have different copies of the QUDA source code on your disk to build separate configurations (e.g. for different GPU architectures) or need to trigger a full rebuild in your local QUDA copy to build it for a different architecture. For example, suppose you have a machine with two GPU partitions. One has NVIDIA P100, and the other has NVIDIA V100. One can download one copy of the QUDA source code (typically named quda
) and then have two build directories (say, build_p100
and build_v100
). The advantage here is that when the source code is updated or modified, one need only change the source code once, then update each build as required.
After downloading QUDA from github, create a build directory and cd into it (the name is arbitrary - here we use build
):
mkdir build
cd build
There are two methods one can use to build. The first is to use ccmake
:
ccmake ../quda
NOTE, for this to work, you may first need to run
cmake [-DQUDA_TARGET_TYPE=<TARGET>] ../quda
and then, launch with ccmake
.
This will bring up a text based GUI for all the QUDA CMake options. If you take this route, please take note that pressing the t
key in the GUI will bring up extra CMAKE options. This, at first, can seem a little daunting, but the majority of the options you see here are automatically populated. Options are grouped into two main parts: CMAKE options (revealed by hitting t
) and QUDA options, each prepended accordingly. CMAKE options are more to do with HOW to build QUDA, and QUDA
options are more to do with WHAT parts to build.
The CMAKE options CMAKE_CUDA_HOST_COMPILER
, CMAKE_CXX_COMPILER
and CMAKE_C_COMPILER
dictate which host C++
and C
and compiler to use. If you want to use a specific compiler, you must set these manually.
The QUDA options, such as QUDA_DIRAC_CLOVER
instruct CMake
to build the wilson clover kernels. If you wish to use them, you must set them ON
. If you do not wish to use a specific part of QUDA, it is strongly recommended that you turn OFF
that QUDA option. That way QUDA will not compile unwanted parts of the code.
After changing the options to your preferences, press c
to configure. As this will force CMake to find further tools / libraries (like locate mpi if you build using mpi). New variables may pop up here and may require you to run multiple times. As soon as the Press [g] to generate and exit
option is shown at the bottom of the screen you may use it and cmake will generate your configuration.
If using the text GUI is not to your liking, then you can configure QUDA directly using cmake
. For example,
cmake ../quda -DQUDA_MPI=ON
cmake .
This will configure QUDA with the default options, except QUDA_MPI
will be turned ON
. Make sure you used the correct architecture for your GPUs in first configuring step. Default architecture is sm_70
but you may want to specify different architectures such as -DQUDA_GPU_ARCH=sm_60
for a Pascal GPU or -DQUDA_GPU_ARCH=sm_80
for A100. The second cmake .
(and no other arguments) command is often required to ensure that all configuration is completed. Without this second step, some configuration may not be complete (this is equivalent to ccmake
requiring multiple configuration passes.
In either case, once QUDA has been configured, you can build with
make -j N
where N is the number of available CPU cores, or alternatively just make -j
, when oversubscribe the CPU cores. This latter approach has typically the shortest time to compile.
Due to QUDA's extensive use of templates, compiling QUDA can take a long time to complete. For this reason, we have provided a variety of CMake options to constrain the compilation trajectory, which can dramatically reduce compilation time.
First and foremost, only enable the Dirac operators that you intend to use. By default all Dirac operators are enabled, so need to disable these. The following is the list of Dirac operators that are present in QUDA, these can be disabled using cmake
with -D
or can be set directly with ccmake
, e.g., --DQUDA_DIRAC_WILSON=OFF
would disable Wilson fermions.
-
QUDA_DIRAC_WILSON
- Wilson Dirac operators -
QUDA_DIRAC_CLOVER
- Wilson-clover operators (impliesQUDA_DIRAC_WILSON
) -
QUDA_DIRAC_TWISTED_MASS
- Twisted-mass Dirac operators -
QUDA_DIRAC_NDEG_TWISTED_MASS
- Non-degenerate twisted-mass operators -
QUDA_DIRAC_TWISTED_CLOVER
- Twisted-clover Dirac operators (impliesQUDA_DIRAC_CLOVER
) -
QUDA_DIRAC_CLOVER_HASENBUSCH
** - Specialized operator for Hasenbusch preconditioning (impliesQUDA_DIRAC_CLOVER
andQUDA_DIRAC_TWISTED_CLOVER
) -
QUDA_DIRAC_STAGGERED
- Naive staggered and improved (HISQ) staggered operators -
QUDA_DIRAC_DOMAIN_WALL
- Shamir Domain-wall (4-d and 5-d preconditioned) and Möbius operators
To simplify this process, we have also added the flag -DQUDA_DIRAC_DEFAULT_OFF
for use on the command line, which by default disables all Dirac operators. This can then be selectively overridden by enabling specific operators. For example, for a build that only supports staggered fermions, one can use
cmake -DQUDA_DIRAC_DEFAULT_OFF=ON -DQUDA_DIRAC_STAGGERED=ON
The following are advanced options that can be specified directly to cmake
with -D
or can be set using ccmake
under advanced options:
-
QUDA_PRECISION=n
- wheren
is 4-bit number that specifies which precisions we will enable (8 - double, 4 - single, 2 - half, 1 - quarter). Default value is 14, which corresponds to double/single/half enabled and quarter disabled. -
QUDA_RECONSTRUCT=n
- wheren
is 3-bit number that specifies which reconstructs we will enable (4 - reconstruct-no, 2 - reconstruct-12/13, 1 - reconstruct-8/9). Default value is 7 which enables all reconstruct types. -
QUDA_FAST_COMPILE_REDUCE=ON
** - where this option only compiles reduction kernels with block-size = 32, dramatically accelerating of the reduction kernels (reduce_quda.cu, multi_reduce_quda.cu, etc.) Additionally, the multi-blas kernels will not employ the warp-shfl optimization. This will affect performance, so should be used for fast debugging or development builds, hence the default value isOFF
. -
QUDA_FAST_COMPILE_DSLASH=ON
** - disables some dslash specialization optimization at the cost of performance. The performance penalty is up to 20% (depending on the action), however the compilation overhead will approximately halve. -
QUDA_MAX_MULTI_BLAS_N=1
- disables some kernel fusion optimization for BLAS routines
** - signifies this option is post QUDA 1.0
By default, QUDA builds as a shared library and takes advantage of rpath to avoid needing to set LD_LIBRARY_PATH
in most cases. If, for some reason, you would prefer a static library build you can set QUDA_BUILD_SHAREDLIB=OFF
. We do not recommend this because it creates a large spike in link time and binary sizes.
You can use Ninja instead of make to improve parallel builds by specifying it as cmake generator in the initial cmake run
cmake -GNinja ...
and then build using
ninja
or just use
cmake --build .
A further reduction of the overall build time can be achieved by using an alternative linker like LLVM's lld or mold. For using mold you can just use
mold -run ninja
By default, QUDA will automatically download Eigen (version 3.3.9 at time of writing) as part of the build process. As part of this, the CMake configuration scripts bake in a checksum to verify the download. While neither of these are issues the majority of the time, we do provide a way via cmake
to specify a local copy of Eigen which bypasses the download (for example, for a machine without an external internet connection) and checksum verification (in the rare cases where the the downloaded tarball is updated and the checksum changes).
As an example, one can download Eigen from https://gitlab.com/libeigen/eigen/-/archive/3.3.9/eigen-3.3.9.tar.bz2 , untar it, and then specify the installation location via:
cmake -DQUDA_DOWNLOAD_EIGEN=OFF -DEIGEN_INCLUDE_DIR=${HOME}/eigen-3.3.9/ [...]
Update EIGEN_INCLUDE_DIR
as appropriate for your download location.
While QUDA can be build using clang as compiler this is still considered early and might not work for all possible options and the performance may not perform as expected!
The development version of QUDA now supports building QUDA with clang as CUDA compiler. This requires
- CMake >= 3.18
- Clang >= 10 and a compatible CUDA toolkit (see https://www.llvm.org/docs/CompileCudaWithLLVM.html for details)
To enable the use of clang as CUDA compiler execute the initial cmake call with the options
-DCMAKE_CUDA_COMPILER=clang++ -DCMAKE_CXX_COMPILER=clang++
You might need to specify the full path to clang++
and append a version number. If you need to specify a specific CUDA toolkit or have it installed in an uncommon location you can do that with
-DCUDAToolkit_ROOT=/some/path
Note: The CUDA Toolkit detection is done by FindCUDAToolkit and its documentation has more details on determining the CUDA Toolkit.