-
Notifications
You must be signed in to change notification settings - Fork 192
GPUs
The table below lists which solvers and preconditioners in HYPRE are GPU-enabled. Note that not all options for each solver or preconditioner have been ported to run on GPUs. "N/A" indicates that a method is not accessible via the corresponding interface.
Preconditioner | Struct Interface | SStruct Interface | IJ Interface |
---|---|---|---|
BoomerAMG | N/A | Yes | Yes |
MGR | N/A | N/A | Yes |
AMS | N/A | N/A | Yes |
ADS | N/A | N/A | Yes |
FSAI | N/A | N/A | Yes |
ILU | N/A | N/A | Yes |
Hybrid | N/A | N/A | Yes |
PFMG | Yes | N/A | N/A |
SMG | Yes | N/A | N/A |
Split | N/A | Yes | N/A |
ParaSails | N/A | N/A | No |
Euclid | N/A | N/A | No |
ParILUT | N/A | N/A | No |
Solver | Struct Interface | SStruct Interface | IJ Interface |
---|---|---|---|
PCG | Yes | Yes | Yes |
GMRES | Yes | Yes | Yes |
FlexGMRES | Yes | Yes | Yes |
LGMRES | Yes | Yes | Yes |
BiCGSTAB | Yes | Yes | Yes |
Basic information on how to compile and run on GPUs can be found in the Users Manual. This document is intended to provide additional details.
Hypre provides two user-level memory locations, HYPRE_MEMORY_HOST
and HYPRE_MEMORY_DEVICE
, where
HYPRE_MEMORY_HOST
is always the CPU memory while HYPRE_MEMORY_DEVICE
can be mapped to different memory spaces based on the configure options of hypre. When built with --with-cuda
or --with-device-openmp
, HYPRE_MEMORY_DEVICE
is the GPU device memory, and when built with additionally --enable-unified-memory
, it is the GPU unified memory (UM). For a non-GPU build, HYPRE_MEMORY_DEVICE
is also mapped to the CPU memory. The default memory location of hypre's matrix and vector objects is HYPRE_MEMORY_DEVICE
, which can be changed by HYPRE_SetMemoryLocation(...)
.
The execution policies define the platform of running computations based on the memory locations of participating objects. The default policy is HYPRE_EXEC_HOST
, i.e., executing on the host if the objects are accessible from the host. It can be adjusted by HYPRE_SetExecutionPolicy(...)
. Clearly, this policy only has effect to objects on UM, since UM is accessible from both CPUs and GPUs.
No special changes to the solvers' interfaces need to be made other than to give GPU memory addresses for all input pointers.
Current AMG setup and solve parameters that have GPU support are listed as follows:
-
AMG setup
- Coarsening algorithm: PMIS (8) and aggressive coarsening
- Interpolation algorithms: direct (3), BAMG-direct (15), extended+i (6), extended (14) and extended+e (18). Second-stage interpolation with aggressive coarsening: extended (5) and extended+e (7)
- RAP: 2-multiplication R(AP), 1-multiplication RAP
-
AMG solve
- Smoothers: Jacobi (7), two-stage Gauss-Seidel (11, 12), l1-Jacobi (18), and Chebyshev (16). Relaxation order can be lexicographic order, or C/F for (7) and (18).
- Matrix-by-vector: save local transposes of P to explicitly multiply with P^{T}
A sample code of setting up IJ matrix A and solve Ax=b using AMG-preconditioned CG is shown below.
cudaSetDevice(device_id); /* GPU binding */
...
HYPRE_Init(); /* must be the first HYPRE function call */
...
/* AMG in GPU memory (default) */
HYPRE_SetMemoryLocation(HYPRE_MEMORY_DEVICE);
/* setup AMG on GPUs */
HYPRE_SetExecutionPolicy(HYPRE_EXEC_DEVICE);
/* use hypre's SpGEMM instead of cuSPARSE */
HYPRE_SetSpGemmUseCusparse(FALSE);
/* use GPU RNG */
HYPRE_SetUseGpuRand(TRUE);
if (useHypreGpuMemPool)
{
/* use hypre's GPU memory pool */
HYPRE_SetGPUMemoryPoolSize(bin_growth, min_bin, max_bin, max_bytes);
}
else if (useUmpireGpuMemPool)
{
/* or use Umpire GPU memory pool */
HYPRE_SetUmpireUMPoolName("HYPRE_UM_POOL_TEST");
HYPRE_SetUmpireDevicePoolName("HYPRE_DEVICE_POOL_TEST");
}
...
/* setup IJ matrix A */
HYPRE_IJMatrixCreate(comm, first_row, last_row, first_col, last_col, &ij_A);
HYPRE_IJMatrixSetObjectType(ij_A, HYPRE_PARCSR);
/* GPU pointers; efficient in large chunks */
HYPRE_IJMatrixAddToValues(ij_A, num_rows, num_cols, rows, cols, data);
HYPRE_IJMatrixAssemble(ij_A);
HYPRE_IJMatrixGetObject(ij_A, (void **) &parcsr_A);
...
/* setup AMG */
HYPRE_ParCSRPCGCreate(comm, &solver);
HYPRE_BoomerAMGCreate(&precon);
HYPRE_BoomerAMGSetRelaxType(precon, rlx_type); /* 7, 18, 11, 12, (3, 4, 6) */
HYPRE_BoomerAMGSetRelaxOrder(precon, FALSE); /* must be false */
HYPRE_BoomerAMGSetCoarsenType(precon, coarsen_type); /* 8 */
HYPRE_BoomerAMGSetInterpType(precon, interp_type); /* 3, 15, 6, 14, 18 */
HYPRE_BoomerAMGSetAggNumLevels(precon, agg_num_levels);
HYPRE_BoomerAMGSetAggInterpType(precon, agg_interp_type); /* 5 or 7 */
HYPRE_BoomerAMGSetKeepTranspose(precon, TRUE); /* keep transpose to avoid SpMTV */
HYPRE_BoomerAMGSetRAP2(precon, FALSE); /* RAP in two multiplications (default: FALSE) */
HYPRE_ParCSRPCGSetPrecond(solver, HYPRE_BoomerAMGSolve, HYPRE_BoomerAMGSetup, precon);
HYPRE_PCGSetup(solver, parcsr_A, b, x);
...
/* solve */
HYPRE_PCGSolve(solver, parcsr_A, b, x);
...
HYPRE_Finalize(); /* must be the last HYPRE function call */
Add the following configure
options. The default is to use Umpire pooling allocator for GPU device and unified memory.
--with-umpire --with-umpire-include=/path-of-umpire-install/include
--with-umpire-lib-dirs=/path-of-umpire-install/lib
--with-umpire-libs=umpire