Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libflame LAPACK interface: multiple calls within a OMP do loop #34

Open
tgastine opened this issue Dec 20, 2019 · 11 comments
Open

libflame LAPACK interface: multiple calls within a OMP do loop #34

tgastine opened this issue Dec 20, 2019 · 11 comments

Comments

@tgastine
Copy link

tgastine commented Dec 20, 2019

Dear libflame developer,

I aim to compute a series of LU factorisation and solves of dense matricies (basically using dgetrf and dgetrs calls from LAPACK). A minimal example with a small matrix follows

program test
  
   use iso_fortran_env, only: dp => real64
   use iso_c_binding

   implicit none

   real(dp), allocatable :: a(:,:)
   integer(dp), allocatable :: piv(:)

   integer :: info, n, i, j

   n = 256
   allocate( a(n,n), piv(n))

   !-- Fill a matrix
   do i=1,n
      a(i,i)  = 2.0_dp
      if ( i < n ) a(i,i+1)= -1.0_dp
      if ( i > 1 ) a(i,i-1)= -1.0_dp
   end do


   !$omp parallel do
   do j=1, 1000
      !-- Lu factorisations
      call dgetrf(n,n, a(1:n,1:n), n, piv(1:n), info)
   end do
   !$omp end parallel do

   deallocate(a, piv)

end program test

This piece of code runs fine with LAPACK or the MKL. I also manage to run it fine with LIBFLAME but only when OMP_NUM_THREADS=1. As soon as OMP_NUM_THREADS > 1, the code crashes and yields

libflame: src/base/flamec/check/base/main/FLA_Obj_datatype_check.c (line 18):
libflame: Encountered NULL pointer.
libflame: Aborting.

I tried it on several platforms with sandybridge, haswell or zen2 processors and using the GNU or the Intel compilers but I couldn't make it work. Any idea what is causing this problem?

Best regards.

@dev-zero
Copy link
Contributor

Although I have to admit we're using the AMDs libflame-fork, I see the same error when calling dpotrf from multiple threads as being done by CP2K:

libflame: /scratch/e1000/timuel/spack-stage/spack-stage-amdlibflame-3.0-ffgwl56r4x36ie7wjxbdlvlrce63tmti/spack-src/src/base/flamec/main/FLA_Blocksize.c (line 125):
libflame: Encountered NULL pointer.
libflame: Aborting.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

libflame: /scratch/e1000/timuel/spack-stage/spack-stage-amdlibflame-3.0-ffgwl56r4x36ie7wjxbdlvlrce63tmti/spack-src/src/blas/3/trsm/front/flamec/FLA_Trsm_lun.c (line 61):
libflame: Function or conditional branch/case not yet implemented.
libflame: Aborting.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x2b70c65d359f in ???
#1  0x2b70c65d3520 in ???
#2  0x2b70c65d4b00 in ???
#3  0x2b70bf57f748 in ???
#4  0x2b70bf57fb32 in ???
#5  0x2b70bf57dd6a in ???
#6  0x2b70bf57debf in ???
#7  0x2b70bf797bb2 in ???
#8  0x2b70bedb8da9 in ???
#0  0x2abadca7659f in ???
#1  0x2abadca76520 in ???
#2  0x2abadca77b00 in ???
#3  0x2abad5a22748 in ???
#4  0x2abad5a22b32 in ???
#5  0x2abad5c9208f in ???
#6  0x2abad525dcdd in ???
#9  0x237cc99 in __mathlib_MOD_invmat_symm
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/common/mathlib.F:602
#7  0x237c865 in __mathlib_MOD_invmat_symm
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/common/mathlib.F:609
#10  0x13328bc in inverse_lri_overlap
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/lri_environment_methods.F:1001
#11  0x13328bc in __lri_environment_methods_MOD_calculate_lri_integrals._omp_fn.0
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/lri_environment_methods.F:276
#8  0x13328bc in inverse_lri_overlap
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/lri_environment_methods.F:1001
#9  0x13328bc in __lri_environment_methods_MOD_calculate_lri_integrals._omp_fn.0
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/lri_environment_methods.F:276
#12  0x2b70c5f0f6d1 in GOMP_parallel

And here is mathlib.F:602

https://github.com/cp2k/cp2k/blob/3c81dfb14877e2dca5ed75034aede06b886d942b/src/common/mathlib.F#L601-L606

@dev-zero
Copy link
Contributor

another error from the same function:

libflame: /scratch/e1000/timuel/spack-stage/spack-stage-amdlibflame-3.0-ffgwl56r4x36ie7wjxbdlvlrce63tmti/spack-src/src/lapack/dec/chol/front/flamec/FLA_Chol_u.c (line 69):
libflame: Function or conditional branch/case not yet implemented.
libflame: Aborting.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

libflame: /scratch/e1000/timuel/spack-stage/spack-stage-amdlibflame-3.0-ffgwl56r4x36ie7wjxbdlvlrce63tmti/spack-src/src/lapack/dec/chol/front/flamec/FLA_Chol_u.c (line 69):
libflame: Function or conditional branch/case not yet implemented.
libflame: Aborting.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x2b9f7574559f in ???
#1  0x2b9f75745520 in ???
#2  0x2b9f75746b00 in ???
#3  0x2b9f6e6f1748 in ???
#4  0x2b9f6e6f1b32 in ???
#5  0x2b9f6e9048be in ???
#6  0x2b9f6df2ada9 in ???
#0  0x2b3d951ac59f in ???
#1  0x2b3d951ac520 in ???
#2  0x2b3d951adb00 in ???
#3  0x2b3d8e158748 in ???
#4  0x2b3d8e158b32 in ???
#5  0x2b3d8e36b8be in ???
#6  0x2b3d8d991da9 in ???
#7  0x237cc99 in __mathlib_MOD_invmat_symm
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/common/mathlib.F:602
#7  0x237cc99 in __mathlib_MOD_invmat_symm
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/common/mathlib.F:602
#8  0x13328bc in inverse_lri_overlap
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/lri_environment_methods.F:1001
#9  0x13328bc in __lri_environment_methods_MOD_calculate_lri_integrals._omp_fn.0
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/lri_environment_methods.F:276
#8  0x13328bc in inverse_lri_overlap
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/lri_environment_methods.F:1001
#9  0x13328bc in __lri_environment_methods_MOD_calculate_lri_integrals._omp_fn.0
        at /scratch/e1000/timuel/spack-stage/spack-stage-cp2k-master-z2723yunlobhfh36pl7ub56uki77rcuj/spack-src/src/lri_environment_methods.F:276
#10  0x2b9f75089015 in gomp_thread_start
        at ../../../cray-gcc-10.2.0-202007302223.f679c32a909ac/libgomp/team.c:123
#11  0x2b9f73b894f8 in ???
#12  0x2b9f75807fbe in ???
#13  0xffffffffffffffff in ???
#10  0x2b3d94ae86d1 in GOMP_parallel
        at ../../../cray-gcc-10.2.0-202007302223.f679c32a909ac/libgomp/parallel.c:171

@srvasanth
Copy link

Hi, I am Vasanth, part of libFLAME developer team in AMD.
can you please confirm the release version of libFLAME you are using and the configure options that you used while building the libFLAME?

@dev-zero
Copy link
Contributor

@srvasanth thanks for the quick follow-up! I've been using Spack to build libflame, this is the spec:

a73g6lc amdlibflame@3.0~debug+lapack2flame+shared+static patches=b3066e8ea70f9a59d1ce00330d72764482dd0faa57d185a45f73ce0effa2bc14 threads=openmp

which according to the Spack package translates to:

./configure --enable-external-lapack-interfaces --enable-lapack2flame --enable-static-build --enable-dynamic-build --enable-supermatrix --enable-max-arg-list-hack

The patch you're seeing there is from this PR: #52 (see also spack/spack#24358).

The full Spack spec to install CP2K with the AMD toolchain was:

$ spack install cp2k@master%gcc@10.2.0~libint+libvori+spglib smm=libxsmm ^amdfftw+openmp ^amdscalapack ^amdblis threads=openmp ^amdlibflame threads=openmp

tested on the HPE XE System (Dual-AMD CPU 7742) Alps (Eiger) of CSCS with 2 MPI ranks/tasks and just 2 OMP threads.

You can drop the +libvori and +spglib if you want to test it (those features are not relevant to the regtests). One of the CP2K test inputs which failed is: https://github.com/cp2k/cp2k/blob/master/tests/QS/regtest-lrigpw-2/2H2O_dpa_1.inp, the full list is here:
error_summary.txt

I am currently waiting for the results of another run with a debug-enabled amdlibflame which should give a proper traceback.

@srvasanth
Copy link

srvasanth commented Jun 17, 2021

@dev-zero the option --enable-supermatrix in the configure command enables multi-threading inside libFLAME and more importantly it disables thread safety features of libFLAME. This should be the one causing the issue.
Basically, if application creates threads and calls libFLAME from those threads parallely, then libFLAME should be configured to not create threads and provide thread safe functionality to the application.
I will get back to you sooner on how to avoid this.

Thanks

@dev-zero
Copy link
Contributor

@srvasanth ok, this would be bad since we have packages in our dependency chain which seemingly require that to be enabled (but maybe it is a misunderstanding what threads=openmp in the context of Spack does: for OpenBLAS for example one has to enable it to get thread-safety). Interestingly most of our tests succeed (and we most definitely also call other routines from OMP-parallel regions), indicating that most of the routines are indeed thread-safe even with --enable-supermatrix.

@dev-zero
Copy link
Contributor

This is the log with the CP2K regtests with a debug-enabled amdlibflame: error_summary.txt which contains the symbols inside amdlibflame for better traceback.

@srvasanth
Copy link

Hi @dev-zero , can you please try without the 'threads=openmp' setting for your sample code? As you mentioned, it may be the confusion in using that flag that is causing the issue. Thanks

@dev-zero
Copy link
Contributor

@srvasanth sorry for the late reply. I can confirm: with threads=none (e.g. no openmp threading in amdlibflame enabled) all our tests succeed. Will this be fixed at some point in the sense that those routines will also work with threading enabled? Point being that our code uses BLAS/LAPACK in various code paths, some of them with threading, some of them without, hence we'd likely profit from threading in amdlibflame.

@srvasanth
Copy link

srvasanth commented Dec 17, 2021

Hi @dev-zero, sorry for the late reply. The current threading feature in amdlibflame is an experimental feature and can be used only through native libflame interfaces. Hence it won't be useful for accelerating applications using LAPACK Fortran interfaces. The current recommended setting for amdlibflame is to disable threading. Correspondingly, the spack command will be:
"spack install cp2k@master%gcc@10.2.0~libint+libvori+spglib smm=libxsmm ^amdfftw+openmp ^amdscalapack ^amdblis
threads=openmp ^amdlibflame"
Threading benefits amdblis and there we can enable threading. Hope this helps.

@dev-zero
Copy link
Contributor

@srvasanth thanks a lot for the clarification. Will keep the threading in amdlibflame disabled for now. Interestingly it seems that despite the functionality supposedly available only via the libflame native interface, building amdlibflame with threads and then running via the LAPACK interface breaks my calculations (I haven't tried with the latest version of amdlibflame, though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants