-
Notifications
You must be signed in to change notification settings - Fork 16
Benchmarks
bettiolm edited this page Jun 9, 2021
·
8 revisions
Times and bandwidth for the different receive types which are available in the unstructured case:
- UNORDERED: non-contiguous receive halos + buffered receive;
- ORDERED: contiguous receive halos + buffered receive;
- IN-PLACE RECEIVE: contiguous receive halos + in-place receive.
- Triangular mesh (average node connectivity = 6), 8*10^6 vertices;
- 100 vertical layers;
- 1 field exchanged, of type
int_64; - 50 warm-up exchanges, 50 measured exchanges, results are averaged over measured repetitions, threads and MPI ranks; final results are then averaged over 3 independent simulations;
- system:
daint-gpu; - configuration: CPU-only, 1 rank per node, 12 threads per rank, 1 local domain per thread;
- transport layer: MPI;
- thread type:
std::thread.


- Triangular mesh (average node connectivity = 6), 5*10^7 vertices;
- 100 vertical layers;
- 1 field exchanged, of type
int_64; - 50 warm-up exchanges, 50 measured exchanges, results are averaged over measured repetitions and MPI ranks; final results are then averaged over 30 independent simulations;
- system:
daint-gpu; - configuration: 1 rank per node, 1 thread per rank, 1 local domain per rank;
- transport layer: MPI.

- Weak scaling starting from 2 nodes (taken as the baseline);
- Atlas Octahedral Gaussian Grid with two different baseline sizes: either 80 or 160 parallels between the Pole and the Equator;
- 100 vertical layers;
- two different exchange setups: either 1
intfield, halo depth 1 or 8intfields, halo depth 2; - 1 warm-up exchange, 50 measured exchanges, results are averaged over measured repetitions and MPI ranks; final results are then averaged over 30 independent simulations;
- system:
daint-gpu; - configuration: 1 rank per node, 1 thread per rank, 1 local domain per rank;
- transport layer: MPI.
