Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(4 in pipeline) Faster RDTSC-based timers and new timer/counter APIs #1018

Draft
wants to merge 123 commits into
base: valassi_3_grid
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
7d01325
[prof] in gg_tt.mad counters.cc, start refactoring of counters - add …
valassi Aug 10, 2024
d43c2f0
[prof] in gg_tt.mad counters.cc driver.f auto_dsig1.f, complete refac…
valassi Aug 10, 2024
5ccf589
[prof] in gg_tt.mad genps.f, add profiling counters to x_to_f_args
valassi Aug 10, 2024
de7d63e
[prof] in gg_tt.mad NNPDFDriver.f add a counter for nnpdf (NB must ma…
valassi Aug 10, 2024
0ef123d
[prof] in gg_tt.mad counters.cc, reimplement counters without maps ag…
valassi Aug 10, 2024
22ce65a
[prof] in gg_tt.mad dsample.f, add time profilers also in sample_put_…
valassi Aug 10, 2024
ce655d0
[prof] in gg_tt.mad counters.cc, rename map_ as array_
valassi Aug 11, 2024
ee6f9f5
[prof] in gg_tt.mad counters.cc add a flag showing if a counter has b…
valassi Aug 11, 2024
feb7a68
[prof] in gg_tt.mad counters.cc, revert the addition of a flag showin…
valassi Aug 11, 2024
0681a76
[prof] in gg_tt.mad counters add an env variable CUDACPP_RUNTIME_DISA…
valassi Aug 11, 2024
de3eac4
[prof] in gg_tt.mad counters, revert the addition of an env variable …
valassi Aug 11, 2024
07e2a93
[prof] in gg_tt.mad counters.cc, improve the error message for counters
valassi Aug 11, 2024
5d4b128
[prof] in gg_tt.mad counters.cc, rename Fortran Overhead as Fortran O…
valassi Aug 11, 2024
6f85197
[prof] in gg_tt.mad, change counter numbers for all counters
valassi Aug 11, 2024
3b798e9
[prof] in gg_tt.mad, add a timer counter for the whole sample_full (e…
valassi Aug 11, 2024
5e4a93f
[prof] in gg_tt.mad, profile the fortran initial i/o: it is now clear…
valassi Aug 11, 2024
14b70ba
[prof] in gg_tt.mad dsample.f, move sample_put_point counters from in…
valassi Aug 11, 2024
d4bb207
[prof] in gg_tt.mad, profile prepare_grouping_choice and select_group…
valassi Aug 11, 2024
d035967
[prof] in gg_tt.mad, profile UPDATE_SCALE_COUPLING_VEC (as "test13" f…
valassi Aug 11, 2024
08b25a6
[prof] in gg_tt.mad, profile UNWGT (as "test16" for the moment, wip)
valassi Aug 11, 2024
cba16ec
[prof] in gg_tt.mad, move x_to_f profiling from genps.f to dsample.f
valassi Aug 12, 2024
968dd22
[prof] in gg_tt.mad, move all COUNTERS_REGISTER_COUNTER calls to driv…
valassi Aug 12, 2024
f94794d
[prof] in gg_tt.mad, move PDF counters from NNPDFDriver.f to auto_dsi…
valassi Aug 12, 2024
ef8cff8
[prof] in gg_tt.mad, profile REWGT (as "test14" for the moment, wip)
valassi Aug 12, 2024
e073613
[prof] in gg_tt.mad, add a second "program initial_i/o" counter"
valassi Aug 12, 2024
fbd5322
[prof] in gg_tt.mad driver.f, clean up comments in counters_register …
valassi Aug 12, 2024
04e39de
[prof] in gg_tt.mad driver.f, rename timers for unwgt, rewgt, scale, …
valassi Aug 12, 2024
62d7c4e
[prof] in gg_tt.mad dsample.f, remove the timer for grouping function…
valassi Aug 12, 2024
d3165cb
[prof] in gg_tt.mad auto_dsig1.f, add profiling for matrix1 also in d…
valassi Aug 12, 2024
d474e21
[prof] in gg_tt.mad, revert the profiling for matrix1 in dsig1
valassi Aug 12, 2024
59dbf04
[prof] in gg_tt.mad, profile ranmar (in ranmar.f: but this causes dou…
valassi Aug 12, 2024
117bd1e
[prof] in gg_tt.mad, revert the profiling of ranmar
valassi Aug 12, 2024
c356280
[prof] in gg_tt.mad driver.f, profile bridge creation/deletion (as te…
valassi Aug 12, 2024
6f86051
[prof] in gg_tt.mad, cleanly define Cudacpp initialise (bridge creati…
valassi Aug 12, 2024
255c343
[prof] in gg_tt.mad, start cleaning up timers: remove the two PROGRAM…
valassi Aug 12, 2024
568e024
[prof] in gg_tt.mad, complete cleanup of timers, with better names an…
valassi Aug 12, 2024
e1e212e
[prof] in gg_tt.mad counters.cc, add "OVERALL MEs" and "OVERALL NON-M…
valassi Aug 12, 2024
c80aa78
[prof] in gg_tt.mad counters add again an env variable CUDACPP_RUNTIM…
valassi Aug 12, 2024
5b24462
[prof] in gg_tt.mad counters.cc, consider printing throughputs only f…
valassi Aug 12, 2024
c9a72f3
[prof] in gg_tt.mad counters.cc, revert the last change
valassi Aug 12, 2024
c330fb1
[prof] in gg_tt.mad counters.cc, fix clang format
valassi Aug 12, 2024
555d91f
[prof] regenerate CODEGEN patch from gg_tt.mad including additional c…
valassi Aug 12, 2024
56404b3
[prof] regenerate all processes
valassi Aug 12, 2024
5a2f534
[prof] rerun 102 tput tests on itscrd90 - all ok
valassi Aug 13, 2024
82f87c2
[prof] rerun 30 tmad tests on itscrd90 WITH NEW COUNTERS - all as exp…
valassi Aug 13, 2024
93cf80e
[prof] in gg_tt.mad, profile gen_mom (13) and sample_get_discrete_x (…
valassi Aug 14, 2024
f77cd1f
[prof] in gg_tt.mad, profile also subsections of genmom... is there a…
valassi Aug 14, 2024
20178c7
[prof] in gg_tt.mad, revert the last two commits (remove test profili…
valassi Aug 19, 2024
17aeb61
[prof] go back to previous tput and tmad logs for easier merging of c…
valassi Aug 19, 2024
2af35cb
[cmsdyps/prof] in gg_tt.mad, backport changes from pp_dy3j.mad (P0_gu…
valassi Aug 17, 2024
0f65d33
[cmsdyps/prof] rerun one tput test for ggtt with the new timers, chec…
valassi Aug 17, 2024
83202ca
[cmsdyps/prof] in gg_tt.mad timermap.h, move to using rdtsc timers by…
valassi Aug 17, 2024
c077f83
[cmsdyps/prof] in tput/throughputX.sh, add a printout about chrono vs…
valassi Aug 17, 2024
88f6916
[cmsdyps/prof] rerun one tput test for ggtt with chrono timers, no ch…
valassi Aug 17, 2024
d10e7f4
[cmsdyps/prof] rerun one tput test for ggtt with rdtsc timers, essent…
valassi Aug 17, 2024
90c863b
[cmsdyps/prof] in gg_tt.mad, backport latest changes in timers and co…
valassi Aug 19, 2024
609b4e4
[cmsdyps/prof] rerun one tput test for ggtt with new chrono timers - …
valassi Aug 19, 2024
d06e6a4
[cmsdyps/prof] rerun one tput test for ggtt with new rdtsc timers - t…
valassi Aug 19, 2024
48c8c79
[cmsdyps/prof] in gg_tt.mad timermap.h and check_sa,cc, fix the calib…
valassi Aug 19, 2024
9bf5e6e
[cmsdyps/prof] rerun one tput test for ggtt with new chrono timers - …
valassi Aug 19, 2024
a1c9b7a
[cmsdyps/prof] rerun one tput test for ggtt with new rdtsc timers - n…
valassi Aug 19, 2024
5fe76e0
[prof] in CODEGEN, backport the latest changes to timermap.h, check_s…
valassi Aug 19, 2024
3435f56
[prof] in CODEGEN, fix clang format for timermap.h, check_sa.cpp, tim…
valassi Aug 19, 2024
6f7076a
[prof] regenerate CODEGEN patch from gg_tt.mad including htuple comme…
valassi Aug 19, 2024
0db0718
[prof] in gg_tt.mad, fix clang format for timermap.h, check_sa.cpp, t…
valassi Aug 19, 2024
e2b46f2
[prof] regenerate gg_tt.mad, all ok
valassi Aug 19, 2024
5d75bb4
[prof] regenerate all processes
valassi Aug 19, 2024
6eb36a6
[prof] rerun a simple tmad test for ggtt... times look ok but through…
valassi Aug 19, 2024
4e7e07c
[prof] in gg_tt.mad and CODEGEN, fix a silly bug in throughputs (was …
valassi Aug 19, 2024
2e43faf
[prof] revert tmad run of ggtt with throughput bug
valassi Aug 19, 2024
42cad8d
[prof] rerun again a simple tmad test for ggtt... now times and throu…
valassi Aug 19, 2024
607abfc
[prof] regenerate gg_tt.mad, all ok
valassi Aug 19, 2024
9a03440
[prof] manually fix counters.cc in all generated processes
valassi Aug 19, 2024
f0a7a3a
[prof] rerun 102 tput tests (with new rdtcs timers) on itscrd90 - all ok
valassi Aug 20, 2024
db32587
[prof] ** COMPLETE PROF ** rerun 30 tmad tests on itscrd90 (with new …
valassi Aug 20, 2024
95329f3
[prof] move to upstream/master codegen logs to ease merging
valassi Aug 21, 2024
9b394e6
Merge remote-tracking branch 'upstream/master' (with hel #960, mac #9…
valassi Aug 21, 2024
56d73ff
[prof] regenerate all processes after merging upstream/master
valassi Aug 21, 2024
9ac0039
[prof] in gg_tt.mad and CODEGEN timers/counters, disable Rdtsc counte…
valassi Aug 21, 2024
5c8d579
[prof] regenerate all processes after disabling Rdtsc counters on pla…
valassi Aug 21, 2024
c60de03
[prof] in CODEGEN/generateAndCompare.sh, add gux_taptamggux (similar …
valassi Aug 23, 2024
3a94376
[prof] add gux_taptamggux.mad to CODEGEN/allGenerateAndCompare.sh
valassi Aug 23, 2024
af682f3
[prof] add gux_taptamggux.mad to the repo, for timer tests
valassi Aug 23, 2024
5f22187
[prof] in gux_taptamggux.mad, switch on SampleGetX profiling as a test
valassi Aug 23, 2024
5c0a2ed
[prof] in gux_taptamggux.mad timer.h, add the option to remove overhe…
valassi Aug 23, 2024
ad9b747
[prof] in gux_taptamggux.mad timer.h, add instead a getTotalOverheadS…
valassi Aug 23, 2024
464b9d7
[prof] in gux_taptamggux.mad counters.cc, add the option to remove ti…
valassi Aug 23, 2024
e33250a
[prof] in gux_taptamggux.mad counters.cc, improve handling of TEST CO…
valassi Aug 23, 2024
eba8039
[prof] in gux_taptamggux.mad counters.cc, add a mechanism for declari…
valassi Aug 23, 2024
51bbbaa
[prof] in gux_taptamggux.mad counters.cc, add a printout of the estim…
valassi Aug 23, 2024
fe44fa9
[prof] in gux_taptamggux.mad, declare SampleGetX as included in Phase…
valassi Aug 23, 2024
5d3da5a
[prof] in gux_taptamggux.mad timer.h, remove all handling of overhead…
valassi Aug 23, 2024
3577a55
[prof] in gux_taptamggux.mad counters.h, move here the handling of co…
valassi Aug 23, 2024
6dcab81
[prof] in gux_taptamggux.mad counters.h, improve the handling of coun…
valassi Aug 23, 2024
ef82161
[prof] move to CODEGEN logs from the latest upstream/master for easie…
valassi Sep 2, 2024
eb7e826
Merge remote-tracking branch 'upstream/master' (including new CI and …
valassi Sep 2, 2024
2525410
[prof] move to tput/tmad logs from the latest upstream/master for eas…
valassi Sep 16, 2024
34041b7
[prof] move to auto_dsig1.f from the latest upstream/master in all ge…
valassi Sep 16, 2024
4df3dfa
Merge remote-tracking branch 'upstream/master' (including june24, goo…
valassi Sep 16, 2024
4d91140
[prof] in gg_tt.mad auto_dsig1.f, add back all counters as in the pro…
valassi Sep 16, 2024
4526aac
[prof] regenerate CODEGEN patch from gg_tt.mad after merging an old u…
valassi Oct 4, 2024
5eacb46
[prof] regenerate all processes after merging an old 'upstream/master…
valassi Oct 4, 2024
95a9070
[prof] move to the latest upstream/master CODEGEN logs for easier mer…
valassi Oct 4, 2024
7c6ba3d
[prof] move to dsample.f from the latest upstream/master in all gener…
valassi Oct 4, 2024
416a52b
Merge remote-tracking branch 'upstream/master' (including v1.0.0 and …
valassi Oct 4, 2024
3ec0964
[prof] regenerate CODEGEN patch from gg_tt.mad after merging upstream…
valassi Oct 4, 2024
817dd25
[prof] regenerate all processes after merging upstream/master(v1.0.0 …
valassi Oct 4, 2024
abf7214
[prof] move to CODEGEN logs from branch grid for easier merging
valassi Oct 6, 2024
6a402e0
Merge branch 'grid' (runcard/bldall/tlau) into prof
valassi Oct 6, 2024
fd3e85e
[prof] regenerate CODEGEN patch from gg_tt.mad after merging grid (ru…
valassi Oct 6, 2024
da9eed2
[prof0] (from prof to prof0) in gg_tt.mad, "temporarely" undo additio…
valassi Oct 6, 2024
1e3199a
[prof0] (from prof to prof0) in CODEGEN output.py and output.py, back…
valassi Oct 6, 2024
3052f12
[prof0] (from prof to prof0) regenerate CODEGEN patch from gg_tt.mad …
valassi Oct 6, 2024
03720a3
[prof0] (from prof to prof0) regenerate all processes after "temporar…
valassi Oct 6, 2024
24f9115
[prof0] rerun 96 tput builds and tests with NEW TIMERS on LUMI/HIP (s…
valassi Oct 7, 2024
af67440
[prof0] rerun 30 tmad tests on LUMI/HIP WITH NEW TIMERS - all as expe…
valassi Oct 7, 2024
3ae513f
[prof0] revert from LUMI to previous itscrd90 tput/tmad logs
valassi Oct 7, 2024
6ebaa5a
[prof0] rerun 96 tput builds and tests on gold91 with NEW TIMERS - al…
valassi Oct 7, 2024
3197d60
[prof0] rerun 30 tmad tests on gold91 with NEW TIMERS - all as expected
valassi Oct 7, 2024
0679b6f
[prof0] revert from gold91 to previous itscrd90 tput/tmad logs
valassi Oct 7, 2024
10e065a
[prof0] rerun 102 tput builds and tests on itscrd90 with NEW TIMERS -…
valassi Oct 7, 2024
cd7bf8c
[prof0] rerun 30 tmad tests on itscrd90 with NEW TIMERS - all as expe…
valassi Oct 7, 2024
4aae106
[prof0] ** COMPLETE PROF0 ** in CHANGELOG.md, document the changes in…
valassi Oct 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
25 changes: 17 additions & 8 deletions epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
<--
Copyright (C) 2020-2024 CERN and UCLouvain.
Licensed under the GNU Lesser General Public License (version 3 or later).
Created by: A. Valassi (Oct 2023) for the MG5aMC CUDACPP plugin.
Further modified by: A. Valassi (2024) for the MG5aMC CUDACPP plugin.
-->

# Changelog

All notable changes to this project will be documented in this file.
Expand All @@ -11,10 +18,11 @@ The format is loosely based on [Keep a Changelog](https://keepachangelog.com).
### Added

- Infrastructure issues
- AV ([#945]) Add cudacpp_bldall runcard to produce multi-backend gridpacks.
- AV ([#700]) Add cudacpp_helinl and cudacpp_hrdcod runcards to support HELINL=1 and HRDCOD=1 builds.
- AV ([#957]) In internal "tlau" tests, instrument python code in gridpacks to provide timing profiles for event generation.
- AV Enhance internal "tlau" tests and add a few test logs and various scripts to analyse them.
- AV ([#945]) Added cudacpp_bldall runcard to produce multi-backend gridpacks.
- AV ([#700]) Added cudacpp_helinl and cudacpp_hrdcod runcards to support HELINL=1 and HRDCOD=1 builds.
- AV ([#957]) In internal "tlau" tests, instrumented python code in gridpacks to provide timing profiles for event generation.
- AV Enhanced internal "tlau" tests and added a few test logs and various scripts to analyse them.
- AV ([#972]) Added new RDTSC-based timers (faster than the legacy chrono-based timers) and adapted all internal APIs.

### Changed

Expand All @@ -29,10 +37,10 @@ The format is loosely based on [Keep a Changelog](https://keepachangelog.com).
- AV ([#1011]) Added workaround for Floating Point Exceptions in vxxxxx in the HIP backend.

- Infrastructure issues
- AV ([#1013]) Fix release scripts to create 'v1.00.01' tags from a '(1,0,1)' python tuple.
- AV ([#1015]) Remove add_input_for_banner from output.py (plugin_run_card is not needed in cudacpp).
- AV ([#995]) In cudacpp_config.mk move default FPTYPE from 'd' to 'm' (already the default cudacpp_backend in run_card.dat).
- AV ([#1013]) Fixed release scripts to create 'v1.00.01' tags from a '(1,0,1)' python tuple.
- AV ([#1015]) Removed add_input_for_banner from output.py (plugin_run_card is not needed in cudacpp).
- AV ([#995]) In cudacpp_config.mk moved default FPTYPE from 'd' to 'm' (already the default cudacpp_backend in run_card.dat).

--------------------------------------------------------------------------------

## [1.00.00] - 2024-10-03
Expand Down Expand Up @@ -74,6 +82,7 @@ The format is loosely based on [Keep a Changelog](https://keepachangelog.com).
[#945]: https://github.com/madgraph5/madgraph4gpu/issues/945
[#957]: https://github.com/madgraph5/madgraph4gpu/issues/957
[#959]: https://github.com/madgraph5/madgraph4gpu/issues/959
[#972]: https://github.com/madgraph5/madgraph4gpu/issues/972
[#993]: https://github.com/madgraph5/madgraph4gpu/issues/993
[#995]: https://github.com/madgraph5/madgraph4gpu/issues/995
[#1011]: https://github.com/madgraph5/madgraph4gpu/issues/1011
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
diff --git b/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f a/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f
index ecd11b239..ec5722702 100644
index ecd11b239..67d1c2218 100644
--- b/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f
+++ a/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f
@@ -76,13 +76,77 @@ c common/to_colstats/ncols,ncolflow,ncolalt,ic
@@ -76,13 +76,80 @@ c common/to_colstats/ncols,ncolflow,ncolalt,ic

include 'coupl.inc' ! needs VECSIZE_MEMMAX (defined in vector.inc)
INTEGER VECSIZE_USED
Expand All @@ -27,6 +27,9 @@ index ecd11b239..ec5722702 100644
+ CALL OMPNUMTHREADS_NOT_SET_MEANS_ONE_THREAD()
+#endif
+ CALL COUNTERS_INITIALISE()
+ CALL COUNTERS_REGISTER_COUNTER( 1, 'Fortran MEs'//char(0) )
+ CALL COUNTERS_REGISTER_COUNTER( 2, 'CudaCpp MEs'//char(0) )
+ CALL COUNTERS_REGISTER_COUNTER( 3, 'CudaCpp HEL'//char(0) )
+
+#ifdef MG5AMC_MEEXPORTER_CUDACPP
+ fbridge_mode = 1 ! CppOnly=1, default for CUDACPP
Expand Down Expand Up @@ -81,7 +84,7 @@ index ecd11b239..ec5722702 100644
c
c Read process number
c
@@ -216,8 +280,33 @@ c call sample_result(xsec,xerr)
@@ -216,8 +283,33 @@ c call sample_result(xsec,xerr)
c write(*,*) 'Final xsec: ',xsec

rewind(lun)
Expand Down Expand Up @@ -116,7 +119,7 @@ index ecd11b239..ec5722702 100644
end

c $B$ get_user_params $B$ ! tag for MadWeight
@@ -400,7 +489,7 @@ c
@@ -400,7 +492,7 @@ c
fopened=.false.
tempname=filename
fine=index(tempname,' ')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -433,10 +433,10 @@ main( int argc, char** argv )
DeviceBufferSelectedColor devSelCol( nevt );
#endif

std::unique_ptr<double[]> genrtimes( new double[niter] );
std::unique_ptr<double[]> rambtimes( new double[niter] );
std::unique_ptr<double[]> wavetimes( new double[niter] );
std::unique_ptr<double[]> wv3atimes( new double[niter] );
std::unique_ptr<uint64_t[]> genrcounts( new uint64_t[niter] );
std::unique_ptr<uint64_t[]> rambcounts( new uint64_t[niter] );
std::unique_ptr<uint64_t[]> wavecounts( new uint64_t[niter] );
std::unique_ptr<uint64_t[]> wv3acounts( new uint64_t[niter] );

// --- 0c. Create curand, hiprand or common generator
const std::string cgenKey = "0c GenCreat";
Expand Down Expand Up @@ -540,7 +540,7 @@ main( int argc, char** argv )
// === STEP 1 OF 3

// *** START THE OLD-STYLE TIMER FOR RANDOM GEN ***
double genrtime = 0;
uint64_t genrcount = 0;

// --- 1a. Seed rnd generator (to get same results on host and device in curand/hiprand)
// [NB This should not be necessary using the host API: "Generation functions
Expand All @@ -551,7 +551,7 @@ main( int argc, char** argv )
const std::string sgenKey = "1a GenSeed ";
timermap.start( sgenKey );
prnk->seedGenerator( seed + iiter );
genrtime += timermap.stop();
genrcount += timermap.stop();

// --- 1b. Generate all relevant numbers to build nevt events (i.e. nevt phase space points) on the host
const std::string rngnKey = "1b GenRnGen";
Expand All @@ -566,19 +566,19 @@ main( int argc, char** argv )
{
// --- 1c. Copy rndmom from host to device
const std::string htodKey = "1c CpHTDrnd";
genrtime += timermap.start( htodKey );
genrcount += timermap.start( htodKey );
copyDeviceFromHost( devRndmom, hstRndmom );
}
#endif

// *** STOP THE OLD-STYLE TIMER FOR RANDOM GEN ***
genrtime += timermap.stop();
genrcount += timermap.stop();

// === STEP 2 OF 3
// Fill in particle momenta for each of nevt events on the device

// *** START THE OLD-STYLE TIMER FOR RAMBO ***
double rambtime = 0;
uint64_t rambcount = 0;

// --- 2a. Fill in momenta of initial state particles on the device
const std::string riniKey = "2a RamboIni";
Expand All @@ -589,7 +589,7 @@ main( int argc, char** argv )
// --- 2b. Fill in momenta of final state particles using the RAMBO algorithm on the device
// (i.e. map random numbers to final-state particle momenta for each of nevt events)
const std::string rfinKey = "2b RamboFin";
rambtime += timermap.start( rfinKey );
rambcount += timermap.start( rfinKey );
prsk->getMomentaFinal();
//std::cout << "Got final momenta" << std::endl;

Expand All @@ -598,30 +598,30 @@ main( int argc, char** argv )
{
// --- 2c. CopyDToH Weights
const std::string cwgtKey = "2c CpDTHwgt";
rambtime += timermap.start( cwgtKey );
rambcount += timermap.start( cwgtKey );
copyHostFromDevice( hstWeights, devWeights );

// --- 2d. CopyDToH Momenta
const std::string cmomKey = "2d CpDTHmom";
rambtime += timermap.start( cmomKey );
rambcount += timermap.start( cmomKey );
copyHostFromDevice( hstMomenta, devMomenta );
}
else // only if ( ! bridge ) ???
{
// --- 2c. CopyHToD Weights
const std::string cwgtKey = "2c CpHTDwgt";
rambtime += timermap.start( cwgtKey );
rambcount += timermap.start( cwgtKey );
copyDeviceFromHost( devWeights, hstWeights );

// --- 2d. CopyHToD Momenta
const std::string cmomKey = "2d CpHTDmom";
rambtime += timermap.start( cmomKey );
rambcount += timermap.start( cmomKey );
copyDeviceFromHost( devMomenta, hstMomenta );
}
#endif

// *** STOP THE OLD-STYLE TIMER FOR RAMBO ***
rambtime += timermap.stop();
rambcount += timermap.stop();

// === STEP 3 OF 3
// Evaluate matrix elements for all nevt events
Expand All @@ -641,7 +641,7 @@ main( int argc, char** argv )
#ifdef MGONGPUCPP_GPUIMPL
// --- 2d. CopyHToD Momenta
const std::string gKey = "0.. CpHTDg";
rambtime += timermap.start( gKey ); // FIXME! NOT A RAMBO TIMER!
rambcount += timermap.start( gKey ); // FIXME! NOT A RAMBO TIMER!
copyDeviceFromHost( devGs, hstGs );
#endif

Expand All @@ -654,8 +654,8 @@ main( int argc, char** argv )
}

// *** START THE OLD-STYLE TIMERS FOR MATRIX ELEMENTS (WAVEFUNCTIONS) ***
double wavetime = 0; // calc plus copy
double wv3atime = 0; // calc only
uint64_t wavecount = 0; // calc plus copy
uint64_t wv3acount = 0; // calc only

// --- 3a. SigmaKin
const std::string skinKey = "3a SigmaKin";
Expand All @@ -664,8 +664,8 @@ main( int argc, char** argv )
pmek->computeMatrixElements( useChannelIds );

// *** STOP THE NEW OLD-STYLE TIMER FOR MATRIX ELEMENTS (WAVEFUNCTIONS) ***
wv3atime += timermap.stop(); // calc only
wavetime += wv3atime; // calc plus copy
wv3acount += timermap.stop(); // calc only
wavecount += wv3acount; // calc plus copy

#ifdef MGONGPUCPP_GPUIMPL
if( !bridge )
Expand All @@ -675,7 +675,7 @@ main( int argc, char** argv )
timermap.start( cmesKey );
copyHostFromDevice( hstMatrixElements, devMatrixElements );
// *** STOP THE OLD OLD-STYLE TIMER FOR MATRIX ELEMENTS (WAVEFUNCTIONS) ***
wavetime += timermap.stop(); // calc plus copy
wavecount += timermap.stop(); // calc plus copy
}
#endif

Expand All @@ -688,16 +688,16 @@ main( int argc, char** argv )
// --- 4a Dump within the loop
const std::string loopKey = "4a DumpLoop";
timermap.start( loopKey );
genrtimes[iiter] = genrtime;
rambtimes[iiter] = rambtime;
wavetimes[iiter] = wavetime;
wv3atimes[iiter] = wv3atime;
genrcounts[iiter] = genrcount;
rambcounts[iiter] = rambcount;
wavecounts[iiter] = wavecount;
wv3acounts[iiter] = wv3acount;

if( verbose )
{
std::cout << std::string( SEP79, '*' ) << std::endl
<< "Iteration #" << iiter + 1 << " of " << niter << std::endl;
if( perf ) std::cout << "Wave function time: " << wavetime << std::endl;
if( perf ) std::cout << "Wave function time: " << wavecount * timermap.secondsPerCount() << std::endl;
}

for( unsigned int ievt = 0; ievt < nevt; ++ievt ) // Loop over all events in this iteration
Expand Down Expand Up @@ -736,6 +736,20 @@ main( int argc, char** argv )
// *** END MAIN LOOP ON #ITERATIONS ***
// **************************************

// Calibrate seconds per count
float secPerCount = timermap.secondsPerCount();
std::unique_ptr<double[]> genrtimes( new double[niter] );
std::unique_ptr<double[]> rambtimes( new double[niter] );
std::unique_ptr<double[]> wavetimes( new double[niter] );
std::unique_ptr<double[]> wv3atimes( new double[niter] );
for( unsigned int iiter = 0; iiter < niter; ++iiter )
{
genrtimes[iiter] = genrcounts[iiter] * secPerCount;
rambtimes[iiter] = rambcounts[iiter] * secPerCount;
wavetimes[iiter] = wavecounts[iiter] * secPerCount;
wv3atimes[iiter] = wv3acounts[iiter] * secPerCount;
}

// === STEP 8 ANALYSIS
// --- 8a Analysis: compute stats after the loop
const std::string statKey = "8a CompStat";
Expand Down
Loading