-
Notifications
You must be signed in to change notification settings - Fork 17
PaRSEC on Intel Xeon Phi Native Mode
Running PaRSEC on Intel Xeon Phi is almost the same as running on other platforms. But there are some differences in the following sections:
To compile PaRSEC/DPLASMA in Intel Xeon Phi,you will need the following software of Intel Xeon Phi version:
- a BLAS library excitable on Intel Xeon Phi, usually is MKL. When using MKL, please use multi-thread version since there are 4 hyper threads per physical core.
- PLASMA. To get a Intel Xeon Phi version, please use x86_64-k1om-linux-gcc or icc with flags -mmic as C compiler, and x86_64-k1om-linux-gfortran or ifort with flags -mmic as Fortran compiler (x86_64-k1om-linux-gcc and x86_64-k1om-linux-gfortran are usually in of your system). **Regular version of PLASMA use sequential MKL, but the Xeon Phi version requires multi-thread MKL.** Since PLASMA requires LAPACK and LAPACKE, make sure you LAPACK and LAPACKE are also compiled with the above compilers.
- PaRSEC is able to use cross platform compilation, so compiling is always done in CPU, then executable program is copied to Intel Xeon Phi and run**
or
Once configuration is done, type in your PaRSEC root folder. Please double check PARSEC_GPU_WITH_CUDA is set to OFF, -openmp is in CFLAGS, CXXFLAGS and FFLAGS. When you have set all the options you want in ccmake, type 'c' to configure again, and 'g' to generate the files. If you entered wrong values in some fields, ccmake will complain at 'c' time. If no changes are made, type 'q' to quit without configure again.
If the configuration was good, compilation should be as simple and fancy as 'make'.
All testing programs are compiled in . Copy your executable testing file to Intel Xeon Phi. Then run it with the following command: , means using 60 cores. Usually, each PaRSEC thread manages 1 physical core. Since there are 4 hyper thread per core, please use
Intel Xeon Phi contains 61 physical cores (1 is reserved for OS, so 60 cores are available for users), so it requires problems to have enough parallelism to get really good performance. Therefore, we introduce virtual core. Each virtual core contains multiple physical cores, and we let each PaRSEC thread manages each virtual core. Inside virtual core is taken care of by multi-thread BLAS. So you can run testing like this:
k is the number of physical cores per virtual core. You can tune k to get a good performance.