From 3dbc85e777f35c493f7b405dd3789ad110fcc120 Mon Sep 17 00:00:00 2001 From: Spencer Bryngelson Date: Sat, 9 Mar 2024 17:31:22 -0500 Subject: [PATCH] update perf in documentation (#367) --- docs/documentation/expectedPerformance.md | 36 ++++++++--------------- 1 file changed, 13 insertions(+), 23 deletions(-) diff --git a/docs/documentation/expectedPerformance.md b/docs/documentation/expectedPerformance.md index 5e5550da0..3c3501b72 100644 --- a/docs/documentation/expectedPerformance.md +++ b/docs/documentation/expectedPerformance.md @@ -5,31 +5,21 @@ This page shows a summary of these results. ## Expected time-steps/hour -The following table outlines expected performance in terms of the number of time steps per hour, rounded to the nearest hundred (higher is better). -A 3D inviscid, 6-equation problem is solved for various problem sizes (grid cells) and hardware. -A 3rd order (3-stage) Runge-Kutta time-stepper is used. -CPU results utilize an entire processor die. +The following table outlines observed performance as nanoseconds per grid point (ns/GP) per right-hand side evaluation (lower is better). +We solve an example 3D, inviscid, 5-equation model problem with two advected species (a total of 8 PDEs). +The numerics are WENO5 and the HLLC approximate Riemann solver. +We report results for various numbers of grid points per CPU die (or GPU device) and hardware. -| Hardware | # Cores | Steps/Hr (1M pts) | Steps/Hr (4M pts) | Steps/Hr (8M pts) | Compiler | Computer | +| Hardware | | 1M GPs | 4M GPs | 8M GPs | Compiler | Computer | | ---: | :----: | :----: | :---: | :---: | :----: | :--- | -| NVIDIA V100 | 1 (device) | 88.5k | 18.7k | N/A | NVHPC 22.11 | PACE Phoenix | -| NVIDIA V100 | 1 (device) | 78.8k | 18.8k | N/A | NVHPC 22.11 | OLCF Summit | -| NVIDIA A100 | 1 (device) | 114.4k | 34.6k | 16.5k | NVHPC 23.5 | Wingtip | -| AMD MI250X | 1 (GCD) | 77.5k | 22.3k | 11.2k | CCE 16.0.1 | OLCF Frontier | -| Intel Xeon Gold 6226 | 12 (cores) | 2.5k | 0.7k | 0.4k | GNU 10.3.0 | PACE Phoenix | -| Apple Silicon M2 | 6 (cores) | 2.8k | 0.6k | 0.2k | GNU 13.2.0 | N/A | - -We also show the expected performance of MFC for the same problem as above, except for the 5-equation model used, in the table below. -It is presented in the same manner as the one above. - -| Hardware | # Cores | Steps/Hr (1M pts) | Steps/Hr (4M pts) | Steps/Hr (8M pts) | Compiler | Computer | -| ---: | :----: | :----: | :---: | :---: | :----: | :--- | -| NVIDIA V100 | 1 (device) | 113.4k | 26.2k | 13.0k | NVHPC 22.11 | PACE Phoenix | -| NVIDIA V100 | 1 (device) | 107.7k | 26.3k | 13.1k | NVHPC 22.11 | OLCF Summit | -| NVIDIA A100 | 1 (device) | 153.5k | 48.0k | 22.5k | NVHPC 23.5 | Wingtip | -| AMD MI250X | 1 (GCD) | 104.2k | 31.0k | 14.8k | CCE 16.0.1 | OLCF Frontier | -| Intel Xeon Gold 6226 | 12 (cores) | 5.4k | 1.6k | 0.8k | GNU 10.3.0 | PACE Phoenix | -| Apple Silicon M2 | 6 (cores) | 3.7k | 11.0k | 0.3k | GNU 13.2.0 | N/A | +| NVIDIA V100 | 1 device | 96 | 104 | 104 | NVHPC 22.11 | PACE Phoenix | +| NVIDIA V100 | 1 device | 101 | 104 | 104 | NVHPC 22.11 | OLCF Summit | +| NVIDIA A100 | 1 device | 71 | 56 | 59 | NVHPC 23.5 | Wingtip | +| AMD MI250X | 1 GCD | 108 | 90 | 96 | CCE 16.0.1 | OLCF Frontier | +| Intel Xeon Gold 6226 | 12 cores | 1963 | 1688 | 1686 | GNU 10.3.0 | PACE Phoenix | +| Apple M2 | 6 cores | 2919 | 245 | 4500 | GNU 13.2.0 | N/A | + +__All results are in nanoseconds (ns) per grid point (gp) per right-hand side (rhs) evaluation. Lower is better.__ ## Weak scaling