-
Notifications
You must be signed in to change notification settings - Fork 0
/
INSTALL
340 lines (289 loc) · 15.6 KB
/
INSTALL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
Software Dependencies
To compile DPLASMA on a new platform, you will need some of the software
below. From 1 to 4 (included) they are mandatory. Everything else is optional,
they provide nice features not critical to the normal usage of this software
package.
1. cmake version 2.8.0 or above. cmake can be found in the debian package cmake,
or as sources at the cmake download page (http://www.cmake.org/)
2. a BLAS library optimized for your platform: MKL, ACML, Goto, Atlas, VecLib
(MAC OS X), or in the worst case the default BLAS (http://www.netlib.org/blas/).
3. hwloc for processor and memory locality features
(http://www.open-mpi.org/projects/hwloc/)
4. Any MPI library Open MPI, MPICH2, MVAPICH or any vendor blessed
implementation.
5. PLASMA version 2.5 up to 2.8 (http://icl.cs.utk.edu/plasma/). Newer versions
are not currently supported.
6. For using PINS (instrumentation based on PAPI) PAPI is required
(http://icl.cs.utk.edu/papi/)
7. For the profiling tools you need several libraries.
- GTG for the trace generation (https://gforge.inria.fr/projects/gtg/)
- Vite a visualization environment (only required for visualization)
(https://gforge.inria.fr/projects/vite/)
- GD usually available on most of the Linux distribution via GraphViz
installation (http://www.graphviz.org/)
Configuring DPLASMA for a new platform
CMake has a comparable goal to configure, but it's subtly different. For one
thing, CMake display the commands with colors, but this is not necessarily its
most prominent feature.
CMake keeps everything it found hitherto in a cache file name
CMakeCache.txt. Until you have successfully configured dplasma, remove the
CMakeCache.txt file each time you run cmake.
First, there are example invocations of cmake in the contrib/platforms
(config.dancer is a typical linux system, config.jaguar is for, you got it, XT5,
...). We advise you to start from this file and tweak it for your system
accordingly to the following guidelines. These files are shell scripts that can
be executed directly from desired build directory (VPATH or not).
In order to enable the Linear Algebra package the PLASMA library is required. If
you have a fairly recent version, correctly configured, this should be a breeze
as it provides a pkg-config file that our configuration scripts
understand. Thus, is your version of PLASMA is recent enough providing
-DPLASMA_DIR=my path to the PLASMA lib, should be enough to have a
straightforward configurations process.
If not, the steps are slightly more complicated, as in addition to correctly
configuring your PLASMA installation you will need to provide our configuration
scripts with few hints to help the detection process. Assume that on your
architecture, the BLAS are mkl in /opt/mkl/lib/em64t; that you need to link with
mkl_gf_lp64, mkl_sequential and mkl_core to have all the BLAS working (NB: with
DPLASMA, use the sequential version of the BLAS, always. Using the threaded
version of the BLAS will decrease performance, even if setting
OMP_NUM_THREADS=1). Assume also that the PLASMA library was installed in
/opt/plasma. You'll want to run the following script in the DPLASMA directory.
rm -f CMakeCache.txt
cmake . -DBLAS_LIBRARIES="-L/opt/mkl/lib/em64t -lmkl_gf_lp64 -lmkl_sequential -lmkl_core" \
-DPLASMA_DIR=/opt/plasma -DDPLASMA_MPI=ON
Hopefully, once the expected arguments are provided the output will look similar to:
-- The C compiler identification is GNU 4.6.0
-- The CXX compiler identification is GNU 4.6.0
-- The Fortran compiler identification is GNU
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/local/bin/gcc
-- Check for working C compiler: /usr/local/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/bin/g++
-- Check for working CXX compiler: /usr/local/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Check for working Fortran compiler: /usr/local/bin/gfortran
-- Check for working Fortran compiler: /usr/local/bin/gfortran -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - failed
-- Checking whether /usr/local/bin/gfortran supports Fortran 90
-- Checking whether /usr/local/bin/gfortran supports Fortran 90 -- yes
-- Found BISON: /opt/local/bin/bison (found version "2.6.5")
-- Found FLEX: /opt/local/bin/flex (found version "2.5.37")
-- Building for target i386
-- Found target for X86
-- Performing Test C_M32or64
-- Performing Test C_M32or64 - Failed
-- Performing Test PARSEC_HAVE_STD_C99
-- Performing Test PARSEC_HAVE_STD_C99 - Failed
-- Performing Test PARSEC_HAVE_WALL
-- Performing Test PARSEC_HAVE_WALL - Failed
-- Performing Test PARSEC_HAVE_WEXTRA
-- Performing Test PARSEC_HAVE_WEXTRA - Failed
-- Performing Test PARSEC_HAVE_WD
-- Performing Test PARSEC_HAVE_WD - Failed
-- Performing Test PARSEC_HAVE_G3
-- Performing Test PARSEC_HAVE_G3 - Failed
-- Performing Test PARSEC_ATOMIC_USE_GCC_32_BUILTINS
-- Performing Test PARSEC_ATOMIC_USE_GCC_32_BUILTINS - Failed
-- Performing Test PARSEC_ATOMIC_USE_XLC_32_BUILTINS
-- Performing Test PARSEC_ATOMIC_USE_XLC_32_BUILTINS - Failed
-- Performing Test PARSEC_ATOMIC_USE_MIPOSPRO_32_BUILTINS
-- Performing Test PARSEC_ATOMIC_USE_MIPOSPRO_32_BUILTINS - Failed
-- Performing Test PARSEC_ATOMIC_USE_SUN_32
-- Performing Test PARSEC_ATOMIC_USE_SUN_32 - Failed
-- Looking for OSAtomicCompareAndSwap32
-- Looking for OSAtomicCompareAndSwap32 - not found
-- Looking for OSAtomicCompareAndSwap64
-- Looking for OSAtomicCompareAndSwap64 - not found
-- Looking for include file pthread.h
-- Looking for include file pthread.h - not found.
-- Could NOT find Threads (missing: Threads_FOUND)
-- Looking for sched_setaffinity
-- Looking for sched_setaffinity - not found
-- Performing Test PARSEC_HAVE_TIMESPEC_TV_NSEC
-- Performing Test PARSEC_HAVE_TIMESPEC_TV_NSEC - Failed
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - not found
-- Looking for include file stdarg.h
-- Looking for include file stdarg.h - not found.
-- Looking for asprintf
-- Looking for asprintf - not found
-- Looking for vasprintf
-- Looking for vasprintf - not found
-- Looking for include file getopt.h
-- Looking for include file getopt.h - not found.
-- Looking for include file unistd.h
-- Looking for include file unistd.h - not found.
-- Looking for getopt_long
-- Looking for getopt_long - not found
-- Looking for include file errno.h
-- Looking for include file errno.h - not found.
-- Looking for include file stddef.h
-- Looking for include file stddef.h - not found.
-- Looking for getrusage
-- Looking for getrusage - not found
-- Looking for include file limits.h
-- Looking for include file limits.h - not found.
-- Looking for include file string.h
-- Looking for include file string.h - not found.
-- Looking for include file complex.h
-- Looking for include file complex.h - not found.
-- Found HWLOC: /opt/lib/libhwloc.dylib
-- Performing Test PARSEC_HAVE_HWLOC_PARENT_MEMBER
-- Performing Test PARSEC_HAVE_HWLOC_PARENT_MEMBER - Failed
-- Performing Test PARSEC_HAVE_HWLOC_CACHE_ATTR
-- Performing Test PARSEC_HAVE_HWLOC_CACHE_ATTR - Failed
-- Performing Test PARSEC_HAVE_HWLOC_OBJ_PU
-- Performing Test PARSEC_HAVE_HWLOC_OBJ_PU - Failed
-- Looking for hwloc_bitmap_free in /opt/lib/libhwloc.dylib
-- Looking for hwloc_bitmap_free in /opt/lib/libhwloc.dylib - not found
-- Found MPI_C: /opt/trunk/debug/lib/libmpi.dylib;/usr/lib/libm.dylib
-- Found MPI_CXX: /opt/trunk/debug/lib/libmpi.dylib;/usr/lib/libm.dylib
-- Found MPI_Fortran: /opt/trunk/debug/lib/libmpi_usempi.a;
-- Looking for MPI_Type_create_resized
-- Looking for MPI_Type_create_resized - not found
-- Looking for include file Ayudame.h
-- Looking for include file Ayudame.h - not found.
-- Add '-undefined dynamic_lookup' to the linking flags
-- Could NOT find GTG (missing: GTG_LIBRARY)
-- Found Graphviz: /usr/local/lib/libcdt.dylib
-- Looking for gdImagePng in /opt/local/lib/libgd.dylib
-- Looking for gdImagePng in /opt/local/lib/libgd.dylib - not found
-- Looking for gdImageJpeg in /opt/local/lib/libgd.dylib
-- Looking for gdImageJpeg in /opt/local/lib/libgd.dylib - not found
-- Looking for gdImageGif in /opt/local/lib/libgd.dylib
-- Looking for gdImageGif in /opt/local/lib/libgd.dylib - not found
-- Found OMEGA: /tools/Omega/binary/include/omega
-- Found PythonInterp: /opt/local/bin/python (found version "2.7.3")
-- Generate precision dependencies in dplasma/parsec.public/data_dist/matrix
-- Generate precision dependencies in dplasma/parsec.public/data_dist/matrix - Done
-- Found PkgConfig: /opt/local/bin/pkg-config (found version "0.27.1")
-- checking for one of the modules 'plasma'
-- Performing Test PLASMA_C_COMPILE_SUCCESS
-- Performing Test PLASMA_C_COMPILE_SUCCESS - Failed
-- Performing Test PLASMA_F_COMPILE_SUCCESS
-- Performing Test PLASMA_F_COMPILE_SUCCESS - Failed
-- A library with PLASMA API not found. Please specify library location.
PLASMA_CFLAGS = [-I/opt/PLASMA/plasma-svn/include]
PLASMA_LDFLAGS = [-framework veclib -L/opt/PLASMA/plasma-svn/lib -lplasma -lcoreblas -lquark -llapacke -ltmg -lpthread -lm -lplasma]
PLASMA_INCLUDE_DIRS = [/opt/PLASMA/plasma-svn/include]
PLASMA_LIBRARY_DIRS = [/opt/PLASMA/plasma-svn/lib]
-- Generate precision dependencies in dplasma/parsec.public/dplasma/include
-- Generate precision dependencies in dplasma/parsec.public/dplasma/include - Done
-- Generate precision dependencies in dplasma/parsec.public/dplasma/cores
-- Generate precision dependencies in dplasma/parsec.public/dplasma/cores - Done
-- Generate precision dependencies in dplasma/parsec.public/dplasma/cores
-- Generate precision dependencies in dplasma/parsec.public/dplasma/cores - Done
-- Generate precision dependencies in dplasma/parsec.public/dplasma/lib
-- Generate precision dependencies in dplasma/parsec.public/dplasma/lib - Done
-- Generate precision dependencies in dplasma/parsec.public/dplasma/lib
-- Generate precision dependencies in dplasma/parsec.public/dplasma/lib - Done
-- Generate precision dependencies in dplasma/parsec.public/dplasma/testing
-- Generate precision dependencies in dplasma/parsec.public/dplasma/testing - Done
-- Configuring done
-- Generating done
-- Build files have been written to: dplasma/parsec.public/build
If this is done, congratulations, DPLASMA is configured and you're ready for
building and testing the system.
In the unlikely case something goes wrong, read carefully the error message. We
spend a significant amount of time trying to output something meaningful for you
and for us (in case you need help to debug/understand). If the output is not
helpful enough to fix the problem, you should contact us via the DPLASMA user
mailing list and provide the CMake command and the flags, the output as well as
the files CMakeFiles/CMakeError.log and CMakeFiles/CMakeOutput.log.
Troubleshooting
Issues we have encountered with BLAS libraries are out of scope of this
README. Please, refer to your own experience to find how to have a working BLAS
library, and header files. Those are supposed to be in the BLAS_LIBRARIES/lib
and BLAS_LIBRARIES/include directories (create a phony directory with symbolic
links to include/ and lib/ if needed).
When using the plasma-installer, some have reported that it was necessary after
the make and make install, to copy all .h files found in src/ to include/
We use quite a few packages that are optional, don't panic if they are not found
during the configuration. However, some of them are critical for increasing the
performance (such as HWLOC).
Check that you have a working MPI somewhere accessible (mpicc and mpirun should
be in your PATH)
If you have strange behavior, check that you have one of the following (if not,
the atomic operations will not work, and that is damageable for the good
operation of DPLASMA)
Found target X86_64
Found target gcc
Found target MACOSX
Found target x86_32
You can tune the compiler using variables (see also ccmake section):
CC to choose your C compiler
FC to choose your Fortran compiler
MPI_COMPILER to choose your mpicc compiler
MPIFC to choose your mpifortran compiler
CFLAGS to change your C compilation flags
LDFLAGS to change your C linking flags
Tuning the configuration : ccmake
When the configuration is successful, you can tune it using ccmake:
ccmake .
(notice the double c of ccmake). This is an interactive tool, that let you
choose the compilation parameters. Navigate with the arrows to the parameter you
want to change and hit enter to edit. Recommended parameters are:
PARSEC_DEBUG OFF (and all other PARSEC_DEBUG options)
PARSEC_DIST_COLLECTIVES ON
PARSEC_DIST_WITH_MPI ON
PARSEC_GPU_WITH_CUDA ON
PARSEC_OMEGA_DIR OFF
PARSEC_PROF_* OFF (all PARSEC_PROF_ flags off)
DPLASMA_CALL_TRACE OFF
DPLASMA_GPU_WITH_MAGMA OFF
Using the 'expert' mode (key 't' to toggle to expert mode), you can change other
useful options, like
CMAKE_C_FLAGS_RELEASE
CMAKE_EXE_LINKER_FLAGS_RELEASE
CMAKE_Fortran_FLAGS_RELEASE
CMAKE_VERBOSE_MAKEFILE
And others to change the path to some compilers, for example. The
CMAKE_VERBOSE_MAKEFILE option, when turned ON, will display the command run when
compiling, which can help debugging configurations mistakes. When you have set
all the options you want in ccmake, type 'c' to configure again, and 'g' to
generate the files. If you entered wrong values in some fields, ccmake will
complain at 'c' time.
Building Dplasma
If the configuration was good, compilation should be as simple and fancy as
'make'. To debug issues, turn the CMAKE_VERBOSE_MAKEFILE option to ON using
ccmake, and check your compilation lines, and adapt your configuration options
accordingly.
Running Dplasma
The dplasma library is compiled into dplasma/library. All testing programs are
compiled in dplasma/testing. Exemples are:
dplasma/testing/testing_?getrf -> LU Factorization (simple or double precision)
dplasma/testing/testing_?geqrf -> QR Factorization (simple or double precision)
dplasma/testing/testing_?potrf -> Cholesky Factorization (simple or double precision)
All the binaries should accept as input:
-c <n> the number of threads used for kernel execution on each node. This should
be set to the number of cores. Remember that one additional thread will be
spawned to handle the communications in the MPI version, but in normal run,
this thread shares the most available core with another thread.
-N SIZE, a mandatory argument to define the size of the matrix
-g <number of GPUs> number of GPUs to use, if the operation is GPU-enabled
-t <blocksize> columns in a tile
-T <blocksize> rows in a tile, (WARNING: actually every algorithm included in
DPLASMA requires square tiles)
-p <number of rows> to require a 2-D block cyclic distribution of p rows
-q <number of columns> to require a 2D block cyclic distribution of q columns
A typical dplasma run using MPI looks like:
mpirun -np 8 ./testing_spotrf -c 8 -g 0 -p 4 -q 2 -t 120 -T 120 -N 1000
Meaning that we'll run Cholesky factorization on 8 nodes, 8 computing threads
per node, nodes being arranged in a 4x2 grid, with a distributed generation of
the matrix of size 1000x1000 singles, with tiles of size 120x120.
Each test can dump the list of options with -h. Some tests have specific options
(like -I to tune the inner block size in QR and LU, and -M in LU or QR to have
non-square matrices).
Happy hacking,
The PaRSEC team.