All notable changes to this project will be documented in this file. The format is based on Keep a Changelog and this project adheres to Semantic Versioning. TODO see github log * redesigned flexpath * new version of adios
0. use environment variable instead of preprocessor
1. nnti in flexpath, not usable.
2. dspaces hanging
3. ENABLE_TESTING in flexpath conf, enabled
[testst]:
- tested in Bridges the code modified in stampede
[added]:
- fix reoganized lbm-mpiio: every proc need update has_more
- fix lammps-mpiio: inconsistent xml files
- DEBUG_adaptor is the debug info for both adios and dspaces adaptor
[fixed]:
- now use uint64_t as the type for global dimension
- mpiio works with 2176, 43xx should be fine too
- flexpath might work, need to test later
[CHANGED]:
- use bp-dir for adios output
- modulize the version control in mpiio
[TODO]:
- reorganize:
- rerun flexpath?
* flexpath dimension:
see flexpath_skel/global_range_select, for 3d dimension, there
seems a reversed dimension
* flexpath slow:
all reduce and all gather
* flexpath crash 2074\*1000000 > int_max
* flexpath, why it is slow
* decaf, why it crashes at large scale?
* regenerate trace in lbm-decaf in adios_perf
* better trace record organization
* to static build with VT, link mpi after VT libraries, as the mpicc -show
-trace does
* after disable all MPI functions, now I have 1088 trace for lammps
* now sim_only trace looks good
* lbm_decaf trace generated, end2end time is 4.3s for 10 steps
* itac:
-tcollect not work, try manually instruction
only need the include directory and -libVT.so library
* Computation not shown
* newly added:
zipper code is added to source tree, see README for how to build
* problems:
self build tau with function name filter. trc file smaller but edf file
still have boost
traces in stampede ..
-- build_tau cannot use 3.7(use select file to fileter out .so makes
3.7 works)
stuck at "C compiler features", use 2.8 instead
-- using 2.8 decaf not found: reason should use DECAF_FOUND(not
Decaf_FOUND) in version
-- now can built
decaf generate too much traces
- boost? /scratch/04446/tg837458/decaf_skel/587326/trace/app0less
events.61.edf
- all boost related have TAU_DEFAULT, how to disable it
* lammps scaling experiments, time is almost the same as lammps library
* tau broken
* decaf timer:
it's okay end2end time is valued in consumerside
* question:
why lbm cannot run large scale but lammps can
I tried the example, if one dimension is larger than 100000, crash
happens
* lammps runs slow, 6~7s per step in 68v34 case, extract is fast
* tried static library, as slow
* tried lammps binary, 2:43s,
* run 100 vs run 1*100 times:
by default it's slower, try to set pre/post for run cmd, quite useful
for coupling processes, see lammps doc:run
even with this, library lammps use 3:45s for 100 steps
* decaf+ lammps vs lammps only
225s for both case, analysis too simple and simulation too long?
* lammps need to build as shared libary, other wise:
/opt/apps/gcc/5.4.0/bin/ld:
/work/04446/tg837458/stampede2/software/lammps-16Feb16/src/liblammps.a(library.o):
relocation R_X86_64_32 against `.rodata.str1.4' can not be used when
making a shared object; recompil
e with -fPIC
* progress
now lammps correctly run with a input file passed from slurm script
* problem fix:
- longer analysis time in stampede: run_analysis is compiled as
dynamic library to accomadate decaf
- now all use static librarys(lbm and analysis)
* parameter:
- how many link processes should be used, see decaf/REAMDE for a small
experiments, 64 procs per node
* problem:
- 2176v1088 crashed
* new results:
decaf_scaling: statistics of 68v34- 1088v544(52.9~79.9s)
* question: analysis time is longer than yuankun's 40s v 20s
* end2end time is 60s+ if using more link processes
* decaf is linked with lbm, end2end time 78s for 64v32, now use nmoments
=4
* decaf is connected with lbm and 100 steps runs for 87s(not using
optimizing flags)
* decaf added to the source tree, cmake files are now in cmake/
* buffer operation is now in seperate objects, also 38s per step
now buffer is seperated from lbm
Use valgrind to debug
valgrind --leak-check=yes ./lbm 1 10 &> run.log
grep -R run.log Invalid, will tell segmenta fault
Doing
modulize
* Lammps and lBM should have been better integrated:
- the decaf examples/C/lammps for example:
* it uses the lammps library interface
* the input file doesn't call "run" and "dump"
* this examples gatter all atoms, however it can be better, see
lammps_gather_atoms for more
details.(https://github.com/lammps/lammps/blob/master/src/lammps.h\c)
- lbm code can also be replaced with better architecture.
flexpath fixed: remove data description, otherwise seg fault(didn't happen
in 1.12)
see flex_debug2 to see more details
todo:
1. use 3d dimension and sliced
2. try skeleton code check timing, 400s!
new commit:
* add avx wall flags, consistent with yuankun's exp
* clean anaylsis code
* add flexpath debug
need to use ddt to look into :
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/329053
use flexpath_debug
using flexpath 1.13 fixing memory usage
but integrating 1.13 with my app, consumer had issue
cm_blocking read
INT_CMstart_read_thread
use_read_thread
366616
rerun flexpthpath 34v17,
365308
8vs4 development, correct
add trace support HAS_TRACE=1 then submit jobs 343400 is an example
This version generates the results for performance comparsion(8vs4) adios_dspaces:137.4 adios_dimes:111.2 adios_mpiio:96.38 adios_flexpath:43.95 native_dspaces:80.52 native_dimes:51.27 zipper:34.8 lbm only: 33.5
flexpath has much better performance when there are less processes in each
node:
using 3 node, if the processes are
8vs4: 43.95s
34vs17 220.76s
68vs34 511.4s
stripe size=-1 not obvious
flexpath is slow at first but has good scalibility
stampede2 is not stable in stampede2.
flexpath/mpiio 512vs256
dspaces test added in project, dspaces server has rdma_bind_addr err with
4 server nodes
- now use mpiexec.hydra with new process layouts
- ibrun info export TACC_IBRUN_DEBUG=1 less /home1/04446/tg837458/Downloads/LaucherTest/results/308504/app2.log less ~/.slurm/job.308504.hostlist.Vri26bxz
* stampded maintainance, will come back to bridges
* dsexample cannot run multiple applications with intel mpirun, 1814068:
run: Job step creation temporarily disabled, retrying
* try launchertest:
mpirun has only output for only one app
- dspace example now runs in stampede2, results to verify
- dspace/flexpath/adios installed. Note withouth avx flags!
small experiments can run,
larger experiment need specifiy layout,