Skip to content

Benchmarks

bettiolm edited this page Jun 9, 2021 · 8 revisions

Unstructured benchmarks

Receive type comparison

Times and bandwidth for the different receive types which are available in the unstructured case:

  • UNORDERED: non-contiguous receive halos + buffered receive;
  • ORDERED: contiguous receive halos + buffered receive;
  • IN-PLACE RECEIVE: contiguous receive halos + in-place receive.

Benchmark details - CPU

  • Triangular mesh (average node connectivity = 6), 8*10^6 vertices;
  • 100 vertical layers;
  • 1 field exchanged, of type int_64;
  • 50 warm-up exchanges, 50 measured exchanges, results are averaged over measured repetitions, threads and MPI ranks; final results are then averaged over 3 independent simulations;
  • system: daint-gpu;
  • configuration: CPU-only, 1 rank per node, 12 threads per rank, 1 local domain per thread;
  • transport layer: MPI;
  • thread type: std::thread.

Benchmark details - GPU

  • Triangular mesh (average node connectivity = 6), 5*10^7 vertices;
  • 100 vertical layers;
  • 1 field exchanged, of type int_64;
  • 50 warm-up exchanges, 50 measured exchanges, results are averaged over measured repetitions and MPI ranks; final results are then averaged over 30 independent simulations;
  • system: daint-gpu;
  • configuration: 1 rank per node, 1 thread per rank, 1 local domain per rank;
  • transport layer: MPI.

Weak scaling using Atlas meshes and fields

Benchmark details - GPU

  • Weak scaling starting from 2 nodes (taken as the baseline);
  • Atlas Octahedral Gaussian Grid with two different baseline sizes: either 80 or 160 parallels between the Pole and the Equator;
  • 100 vertical layers;
  • two different exchange setups: either 1 int field, halo depth 1 or 8 int fields, halo depth 2;
  • 1 warm-up exchange, 50 measured exchanges, results are averaged over measured repetitions and MPI ranks; final results are then averaged over 30 independent simulations;
  • system: daint-gpu;
  • configuration: 1 rank per node, 1 thread per rank, 1 local domain per rank;
  • transport layer: MPI.

Clone this wiki locally