Benchmarks

Unstructured benchmarks

Times and bandwidth for the different receive types which are available in the unstructured case:

Triangular mesh (average node connectivity = 6), 8*10^6 vertices;
100 vertical layers;
1 field exchanged, of type int_64;
50 warm-up exchanges, 50 measured exchanges, results are averaged over measured repetitions, threads and MPI ranks; final results are then averaged over 3 independent simulations;
system: daint-gpu;
configuration: CPU-only, 1 rank per node, 12 threads per rank, 1 local domain per thread;
transport layer: MPI;
thread type: std::thread.

Triangular mesh (average node connectivity = 6), 5*10^7 vertices;
100 vertical layers;
1 field exchanged, of type int_64;
50 warm-up exchanges, 50 measured exchanges, results are averaged over measured repetitions and MPI ranks; final results are then averaged over 30 independent simulations;
system: daint-gpu;
configuration: 1 rank per node, 1 thread per rank, 1 local domain per rank;
transport layer: MPI.

Weak scaling starting from 2 nodes (taken as the baseline);
Atlas Octahedral Gaussian Grid with two different baseline sizes: either 80 or 160 parallels between the Pole and the Equator;
100 vertical layers;
two different exchange setups: either 1 int field, halo depth 1 or 8 int fields, halo depth 2;
1 warm-up exchange, 50 measured exchanges, results are averaged over measured repetitions and MPI ranks; final results are then averaged over 30 independent simulations;
system: daint-gpu;
configuration: 1 rank per node, 1 thread per rank, 1 local domain per rank;
transport layer: MPI.