Documentation: Improved README and Quick start (#105)

* Improved README and quickstart * Notes for single GPUs * typo
claudiopica · Sep 10, 2024 · baf6066 · baf6066
1 parent b4d43bf
commit baf6066
Show file tree

Hide file tree

Showing 2 changed files with 258 additions and 86 deletions.
diff --git a/Doc/user_guide/getting_started.md b/Doc/user_guide/getting_started.md
@@ -7,11 +7,12 @@
 <a href="https://asciinema.org/a/550942"><img src="https://asciinema.org/a/550942.svg" alt="asciicast" class="inline"></a>
 \endhtmlonly 
 <!-- [![asciicast](https://asciinema.org/a/550942.svg)](https://asciinema.org/a/550942) -->
+
 # Dependencies
 
 * A C99 compiler (GCC, clang, icc). OpenMP can be used if supported by the compiler.  
-* If MPI is needed, an MPI implementation, i.e. OpenMPI or MPICH for MPI support. Use a CUDA-aware MPI implementation for multi-GPU support.
-* If GPU acceleration if needed, CUDA 11.x and nvcc compiler to make use of CUDA GPU acceleration.
+* If MPI is needed, an MPI implementation, i.e., OpenMPI or MPICH, for MPI support. Use a CUDA-aware MPI implementation for multi-GPU support.
+* If GPU acceleration is needed, use CUDA 11.x and the nvcc compiler to use CUDA GPU acceleration.
 * Perl 5.x for compilation.
 * [ninja build](https://ninja-build.org/) for compilation.
 
@@ -27,105 +28,196 @@ git clone https://github.com/claudiopica/HiRep
 Make sure the build command `Make/nj` and `ninja` are in your `PATH`. 
 
 ## Adjust compilation options 
-
 Adjust the file `Make/MkFlags` to set the desired options.
-The option file can be generated by using the `Make/Utils/write_mkflags.pl` tools.
-Use:
+The option file can be generated by using the `Make/write_mkflags.pl` tools.
+Use
 ```
 write_mkflags.pl -h
 ```
 for a list of available options. The most important ones include:
 
-* Number of colors (NG)
-```
+### Number of colors (NG)
+```bash
 NG=3
 ```
+### Gauge group SU(NG) or SO(NG)
 
-* Gauge group SU(NG) or SO(NG)
-```
+```bash
 GAUGE_GROUP = GAUGE_SUN
 #GAUGE_GROUP = GAUGE_SON
 ```
 
-* Representation of fermion fields
-```
+### Representation of fermion fields
+```bash
 REPR = REPR_FUNDAMENTAL
 #REPR = REPR_SYMMETRIC
 #REPR = REPR_ANTISYMMETRIC
 #REPR = REPR_ADJOINT
 ```
 
-* Lattice boundary Conditions
+### Lattice boundary conditions
 
-Comment out the line here, when you want to establish certain boundary conditions into the respective direction.
-```
-#Available choices of boundary conditions:
-#T => PERIODIC, ANTIPERIODIC, OPEN, THETA
-#X => PERIODIC, ANTIPERIODIC, THETA
-#Y => PERIODIC, ANTIPERIODIC, THETA
-#Z => PERIODIC, ANTIPERIODIC, THETA
-MACRO += -DBC_T_ANTIPERIODIC
-MACRO += -DBC_X_PERIODIC
-MACRO += -DBC_Y_PERIODIC
-MACRO += -DBC_Z_PERIODIC
+Comment out the line here when you want to establish certain boundary conditions in the respective direction.
+
+Available options are 
+1. BC_\<DIR\>_PERIODIC, for periodic boundary conditions
+2. BC_\<DIR\>_ANTIPERIODIC, for antiperiodic boundary conditions
+3. BC_\<DIR\>_THETA associates a twisting angle to the fermionic field in the specified direction \<DIR\>. The concrete angle has to be specified in the input file.
+4. BC_\<DIR\>_OPEN, for open boundary conditions. Open boundary conditions can only be set in the T direction.
+
+Example for antiperiodic boundary conditions in the time direction and periodic boundary conditions in the spatial dimensions. 
+```bash
+MACRO += BC_T_ANTIPERIODIC
+MACRO += BC_X_PERIODIC
+MACRO += BC_Y_PERIODIC
+MACRO += BC_Z_PERIODIC
 ```
 
-* MACRO options
+### Parallelization
 
 You can select a number of features via the `MACRO` variable. The most important ones are:
 
-Specify, whether you want to compile with MPI by using 
+Specify whether you want to compile with MPI by using 
 
+```bash
+MACRO += WITH_MPI
 ```
-#MACRO += -DWITH_MPI
+
+For compilation with GPU acceleration for CUDA GPUs enable GPU use and use the new geometry. If you try to compile with GPUs but forget to set the new geometry, the compilation will fail.
+
+```bash
+MACRO += WITH_GPU
+MACRO += WITH_NEW_GEOMETRY
 ```
 
-For compilation with GPU acceleration for CUDA GPUs use:
+If you want to compile your code for AMD GPUs, additionally add the flag 
 
+```bash
+MACRO += WITH_GPU
+MACRO += WITH_NEW_GEOMETRY
+MACRO += HIP
 ```
-MACRO += -DWITH_GPU
+
+### Other standard options
+
+```bash
+MACRO += UPDATE_EO
 ```
 
-* Compiler options
+enables even-odd preconditioning, so you never want to disable it.
 
-You can set your choice of C, C++, MPI and CUDA compiler and their options by using the variables:
+```bash
+MACRO += NDEBUG
 ```
+
+**suppresses** debug output. If you delete this option, `HiRep` will print a lot more unnecessary output.
+
+```bash
+MACRO += CHECK_SPINOR_MATCHING
+``` 
+
+This performs a check on the geometries of the spinors and is essential for debugging. In general, leaving it as a safety check does not hurt, but if you simulate with very small local lattices, you may want to disable it and check whether there is a performance improvement.
+
+```bash
+MACRO += IO_FLUSH
+``` 
+
+Prints to file immediately. If the simulation or analysis prints an unusual amount of data, it may affect performance.
+
+### Compiler options
+
+To compile the code for your laptop, you only need to set the C compiler. For example 
+```bash
+CC = gcc
+CFLAGS = -Wall -O3
+INCLUDE = 
+LDFLAGS =
+```
+
+If you want support for parallelization, you need to include the MPI compiler wrapper
+```bash
 CC = gcc
 MPICC = mpicc
+CFLAGS = -Wall -O3
+GPUFLAGS =
+INCLUDE =
+LDFLAGS =
+```
+
+Another example: To use the Intel compiler and Intel's MPI implementation, and no CUDA, one could use:
+
+```bash
+CC = icc
+MPICC = mpiicc
+LDFLAGS = -O3
+INCLUDE =
+```
+
+With a single NVIDIA GPU and without MPI:
+```bash
+CC = gcc
 NVCC = nvcc
 CXX = g++
 LDFLAGS = -Wall -O3
-GPUFLAGS = -arch=sm_80 
-INCLUDE = 
+GPUFLAGS = 
+INCLUDE =
 ```
+Note that this compiles a fat binary but you can also specify a target architecture under the `GPUFLAGS`.
 
-For example, to use the Intel compiler and Intel's MPI implementation, and no CUDA, one could use:
+For a single AMD GPU `nvcc` needs to be replaced by `hipcc`. For LUMI, the standard C and C++ compilers are `cc` and `CC`.
 
+```bash
+CC = cc
+NVCC = hipcc
+CXX = CC
+LDFLAGS = -Wall -O3
+GPUFLAGS = 
+INCLUDE =
 ```
-CC = icc
-MPICC = mpiicc
-LDFLAGS = -O3
+
+Multi-GPU simulations on NVIDIA GPUs: you can set your choice of C, C++, MPI, and CUDA compiler and their options by using the variables:
+```bash
+CC = gcc
+MPICC = mpicc
+NVCC = nvcc
+CXX = g++
+LDFLAGS = -Wall -O3
+GPUFLAGS =
 INCLUDE = 
 ```
 
+For LUMI AMD Multi-GPU jobs, it seems to be favorable to use hipcc instead of `CC`.
+```bash
+ENV = MPICH_CC=hipcc
+CC = gcc
+MPICC = cc
+CFLAGS = -Wall -O3
+NVCC = mpicc
+GPUFLAGS = -w --offload-arch=gfx90a 
+INCLUDE =
+LDFLAGS = --offload-arch=gfx90a
+```
+
+For more information on configuring the code for AMD GPUs, see the user guide on the GitHub pages.
+
 ## Compile the code
+
 From the root folder just type:
-```
+```bash
 nj
 ```
 (this is a tool in the `Make/` folder: make sure it is in your path!)
-The above will compile the `libhr.a` library and all the available executable in the HiRep distribution, including executable for dnamical fermions `hmc` and pure gauge `suN` simulations and all the applicable tests.
-If you wish to compile only one of the executable, e.g. `suN`, just change to the corresponding directory, e.g. `PureGauge`, and execute the `nj` command from there.
+The above will compile the `libhr.a` library and all the available executables in the HiRep distribution, including executables for dynamical fermions `hmc` and pure gauge `suN` simulations and all the applicable tests.
+If you wish to compile only one of the executables, e.g., `suN`, just change to the corresponding directory, e.g., `PureGauge`, and execute the `nj` command from there.
 
 All build artefacts, except the final executables, are located in the `build` folder at the root directory of the distribution.
 
-
 # Run
 
 ## Adjust input file
-As example we will use the `hmc` program which can be found in the `HMC` directory (to create the executable type `nj` in that directory). 
-The `hmc` program will run the generation of lattice configurations with dynamical fermions by using a hybrid Monte Carlo algorithm. The program uses a number of parameters which needs to be specified in an input file, see `HMC/input_file` for an example. 
-Input parameters are divided in different sections, such as: global lattice size, number of MPI processes per direction, random number generator, run control variables, definition of the lattice action to use for the run, etc.
+As an example, we will use the `hmc` program, which can be found in the `HMC` directory (to create the executable type `nj` in that directory). 
+The `hmc` program will generate lattice configurations with dynamical fermions using a hybrid Monte Carlo algorithm. The program uses a number of parameters that need to be specified in an input file; see `HMC/input_file` for an example. 
+Input parameters are divided into different sections, such as global lattice size, number of MPI processes per direction, random number generator, run control variables, and definition of the lattice action to use for the run.
 For example, for basic run control variables, one can have a look at the section `Run control variables`.
 
 ```
@@ -137,27 +229,31 @@ gauge start = random
 last conf = +1
 ```
 
-The "+" in front of `last conf` specifies the number of additional trajectories to be generated after the chosen startup configuration. I.e. if the startup configuration is trajectory number 5 and `last conf = 6` then one additional trajectory will be generated, while if `last conf = +6` then six additional trajectories will be generated (i.e. the last configuration generated will be number 11).
+The "+" in front of `last conf` specifies the number of additional trajectories to be generated after the chosen startup configuration. I.e., if the startup configuration is trajectory number 5 and `last conf = 6`, then one additional trajectory will be generated, while if `last conf = +6`, then six additional trajectories will be generated (i.e., the last configuration generated will be number 11).
 
 ## Execute Binary
 
 When not using MPI, simply run:
 
 ```
-$ ./hmc -i input_file
+./hmc -i input_file
 ```
 
-where `hmc` is the binary generated from `hmc.c`. If you are using openmp, remeber to set `OMP_NUM_THREADS` and other relevant environment variables to the desired value.
+where `hmc` is the binary generated from `hmc.c`. If you are using OpenMP, remember to set `OMP_NUM_THREADS` and other relevant environment variables to the desired value.
 
 For the MPI version, run
 
 ```
-$ mpirun -np <number of MPI processes> ./hmc -i input_file
+mpirun -np <number of MPI processes> ./hmc -i input_file
 ```
 
-The GPU version of the code uses 1 GPU per MPI process.
+or follow the instructions for submitting your script to Slurm. See examples for submit scripts in the documentation.
 
-The output file is written only by the MPI process rank 0, by default in a file called `out_0` in the current directory. A different name for the output file can be set by using the `-o` option.
+The GPU version of the code uses 1 GPU per MPI process.
 
-For debug purposes it is sometimes useful to have output files from all MPI processes. This can be enabled with the compilation option: `MACRO += -DLOG_ALLPIDS`.
+Only the MPI process rank 0 writes the output file, which is by default in a file called `out_0` in the current directory. The `-o` option allows you to set a different name for the output file.
 
+It is sometimes helpful to have output files from all MPI processes for debugging purposes. This can be enabled with the compilation option: 
+```bash
+MACRO += LOG_ALLPIDS
+```