You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/documentation/expectedPerformance.md
+13-9Lines changed: 13 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -1,31 +1,34 @@
1
1
# Performance
2
2
3
3
MFC has been benchmarked on several CPUs and GPU devices.
4
-
This page shows a summary of these results.
4
+
This page is a summary of these results.
5
5
6
6
## Figure of merit: Grind time performance
7
7
8
-
The following table outlines observed performance as nanoseconds per grid point (ns/GP) per equation (eq) per right-hand side (rhs) evaluation (lower is better), also known as the grind time.
8
+
The following table outlines observed performance as nanoseconds per grid point (ns/gp) per equation (eq) per right-hand side (rhs) evaluation (lower is better), also known as the grind time.
9
9
We solve an example 3D, inviscid, 5-equation model problem with two advected species (8 PDEs) and 8M grid points (158-cubed uniform grid).
10
10
The numerics are WENO5 finite volume reconstruction and HLLC approximate Riemann solver.
11
11
This case is located in `examples/3D_performance_test`.
12
+
You can run it via `./mfc.sh run -n <num_processors> -j $(nproc) ./examples/3D_performance_test/case.py -t pre_process simulation --case-optimization`, which will build an optimized version of the code for this case then execute it.
13
+
If the above does not work on your machine, see the rest of this documentation for other ways to use the `./mfc.sh run` command.
12
14
13
15
Results are for MFC v4.9.3 (July 2024 release), though numbers have not changed meaningfully since then.
16
+
Similar performance is also seen for other problem configurations, such as the Euler equations (4 PDEs).
14
17
All results are for the compiler that gave the best performance.
15
18
Note:
16
-
* CPU results may be performed on CPUs with more cores than reported in the table; we report results for the best performance given the full processor die by checking the performance for different core counts on that device. CPU results are for the best performance we achieved using a single socket (or die).
19
+
* CPU results may be performed on CPUs with more cores than reported in the table; we report results for the best performance given the full processor die by checking the performance for different core counts on that device. CPU results are the best performance we achieved using a single socket (or die).
17
20
These are reported as (X/Y cores), where X is the used cores, and Y is the total on the die.
18
-
* GPU results are for a single GPU device. For single-precision (SP) GPUs, we performed computation in double-precision via conversion in compiler/software; these numbers are _not_ for single-precision computation. AMD MI250X GPUs have two graphics compute dies (GCDs) per MI250X device; we report results for one GCD, though one can quickly estimate full MI250X runtime by halving the single GCD grind time number.
21
+
* GPU results are for a single GPU device. For single-precision (SP) GPUs, we performed computation in double-precision via conversion in compiler/software; these numbers are _not_ for single-precision computation. AMD MI250X and MI300A GPUs have multiple graphics compute dies (GCDs) per device; we report results for one _GCD_*, though one can quickly estimate full device runtime by dividing the grind time number by the number of GCDs on the device (the MI250X has 2 GCDs). We gratefully acknowledge the permission of LLNL, HPE/Cray, and AMD for permission to release MI300A performance numbers.
19
22
20
-
| Hardware || Grind Time | Compiler| Computer |
23
+
| Hardware || Grind Time [ns]| Compiler | Computer |
-[Strawberry Perl](https://strawberryperl.com/) (Install and add `C:\strawberry\perl\bin\perl.exe` or your installation path to your [PATH](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/))
93
93
Please note that Visual Studio must be installed first, and the oneAPI Toolkits need to be configured with the installed Visual Studio, even if you plan to use a different IDE.
94
94
95
-
Then, in order to initialize your development environment, run the following command (or your installation path) in command prompt:
95
+
Then, to initialize your development environment, run the following command (or your installation path) in the command prompt:
96
96
```shell
97
97
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
98
98
```
@@ -130,8 +130,8 @@ open ~/.bash_profile
130
130
131
131
An editor should open.
132
132
Please paste the following lines into it before saving the file.
133
-
If you wish to use a version of GNU's GCC other than 13, modify the first assignment.
134
-
These lines ensure that LLVM's Clang, and Apple's modified version of GCC, won't be used to compile MFC.
133
+
Modify the first assignment if you wish to use a different version of GNU's GCC.
134
+
These lines ensure that LLVM's Clang and Apple's modified version of GCC are not used to compile MFC.
135
135
Further reading on `open-mpi` incompatibility with `clang`-based `gcc` on macOS: [here](https://stackoverflow.com/questions/27930481/how-to-build-openmpi-with-homebrew-and-gcc-4-9).
136
136
We do *not* support `clang` due to conflicts with the Silo dependency.
137
137
@@ -158,7 +158,7 @@ They will download the dependencies MFC requires to build itself.
158
158
Docker is a lightweight, cross-platform, and performant alternative to Virtual Machines (VMs).
159
159
We build a Docker Image that contains the packages required to build and run MFC on your local machine.
@@ -182,13 +182,11 @@ recommended settings applied, run
182
182
.\mfc.bat docker # If on Windows
183
183
```
184
184
185
-
We automatically mount and configure the proper permissions in order for you to
186
-
access your local copy of MFC, available at `~/MFC`. You will be logged-in as the
187
-
`me` user with root permissions.
185
+
We automatically mount and configure the proper permissions for you to access your local copy of MFC, available at `~/MFC`.
186
+
You will be logged in as the `me` user with root permissions.
188
187
189
188
:warning: The state of your container is entirely transient, except for the MFC mount.
190
-
Thus, any modification outside of `~/MFC` should be considered as permanently lost upon
191
-
session exit.
189
+
Thus, any modification outside of `~/MFC` should be considered permanently lost upon session exit.
192
190
193
191
</details>
194
192
@@ -207,10 +205,10 @@ MFC can be built with support for various (compile-time) features:
207
205
_⚠️ The `--gpu` option requires that your compiler supports OpenACC for Fortran for your target GPU architecture._
208
206
209
207
When these options are given to `mfc.sh`, they will be remembered when you issue future commands.
210
-
You can enable and disable features at any time by passing any of the arguments above.
211
-
For example, if you have previously built MFC with MPI support and no longer wish to run using MPI, you can pass `--no-mpi` once, for the change to be permanent.
208
+
You can enable and disable features anytime by passing any of the arguments above.
209
+
For example, if you previously built MFC with MPI support and no longer wish to run using MPI, you can pass `--no-mpi` once, making the change permanent.
212
210
213
-
MFC is composed of three codes, each being a separate _target_.
211
+
MFC comprises three codes, each being a separate _target_.
214
212
By default, all targets (`pre_process`, `simulation`, and `post_process`) are selected.
215
213
To only select a subset, use the `-t` (i.e., `--targets`) argument.
216
214
For a detailed list of options, arguments, and features, please refer to `./mfc.sh build --help`.
Copy file name to clipboardExpand all lines: docs/documentation/running.md
+10-9Lines changed: 10 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,9 @@
1
1
# Running
2
2
3
3
MFC can be run using `mfc.sh`'s `run` command.
4
-
It supports both interactive and batch execution, the latter being designed for multi-socket systems, namely supercomputers, equipped with a scheduler such as PBS, SLURM, and LSF.
5
-
A full (and updated) list of available arguments can be acquired with `./mfc.sh run -h`.
4
+
It supports interactive and batch execution.
5
+
Batch mode is designed for multi-node distributed systems (supercomputers) equipped with a scheduler such as PBS, SLURM, or LSF.
6
+
A full (and up-to-date) list of available arguments can be acquired with `./mfc.sh run -h`.
6
7
7
8
MFC supports running simulations locally (Linux, MacOS, and Windows) as well as
8
9
several supercomputer clusters, both interactively and through batch submission.
@@ -17,7 +18,7 @@ several supercomputer clusters, both interactively and through batch submission.
17
18
>
18
19
> Adding a new template file or modifying an existing one will most likely be required if:
19
20
> - You are on a cluster that does not have a template yet.
20
-
> - Your cluster is configured with SLURM but interactive job launches fail when
21
+
> - Your cluster is configured with SLURM, but interactive job launches fail when
21
22
> using `srun`. You might need to invoke `mpirun` instead.
22
23
> - Something in the existing default or computer template file is incompatible with
23
24
> your system or does not provide a feature you need.
@@ -88,7 +89,7 @@ MFC provides two different arguments to facilitate profiling with NVIDIA Nsight.
88
89
- Nsight Systems (Nsys): `./mfc.sh run ... -t simulation --nsys [nsys flags]` allows one to visualize MFC's system-wide performance with [NVIDIA Nsight Systems](https://developer.nvidia.com/nsight-systems).
89
90
NSys is best for understanding the order and execution times of major subroutines (WENO, Riemann, etc.) in MFC.
90
91
When used, `--nsys` will run the simulation and generate `.nsys-rep` files in the case directory for all targets.
91
-
These files can then be imported into Nsight System's GUI, which can be downloaded [here](https://developer.nvidia.com/nsight-systems/get-started#latest-Platforms). It is best to run case files with a few timesteps to keep the report files small. Learn more about NVIDIA Nsight Systems [here](https://docs.nvidia.com/nsight-systems/UserGuide/index.html).
92
+
These files can then be imported into Nsight System's GUI, which can be downloaded [here](https://developer.nvidia.com/nsight-systems/get-started#latest-Platforms). To keep the report files small, it is best to run case files with a few timesteps. Learn more about NVIDIA Nsight Systems [here](https://docs.nvidia.com/nsight-systems/UserGuide/index.html).
92
93
- Nsight Compute (NCU): `./mfc.sh run ... -t simulation --ncu [ncu flags]` allows one to conduct kernel-level profiling with [NVIDIA Nsight Compute](https://developer.nvidia.com/nsight-compute).
93
94
NCU provides profiling information for every subroutine called and is more detailed than NSys.
94
95
When used, `--ncu` will output profiling information for all subroutines, including elapsed clock cycles, memory used, and more after the simulation is run.
@@ -101,11 +102,11 @@ Learn more about NVIDIA Nsight Compute [here](https://docs.nvidia.com/nsight-com
101
102
When used, `--roc` will run the simulation and generate files in the case directory for all targets.
102
103
`results.json` can then be imported in [Perfetto's UI](https://ui.perfetto.dev/).
103
104
Learn more about AMD Rocprof [here](https://rocm.docs.amd.com/projects/rocprofiler/en/docs-5.5.1/rocprof.html)
104
-
It is best to run case files with a few timesteps to keep the report files small.
105
-
- Omniperf (OMNI): `./mfc.sh run ... -t simulation --omni [omniperf flags]`allows one to conduct kernel-level profiling with [AMD Omniperf](https://rocm.github.io/omniperf/introduction.html#what-is-omniperf).
106
-
When used, `--omni` will output profiling information for all subroutines, including rooflines, cache usage, register usage, and more after the simulation is run.
105
+
It is best to run case files with few timesteps to keep the report file sizes manageable.
106
+
- Omniperf (OMNI): `./mfc.sh run ... -t simulation --omni [omniperf flags]`allows one to conduct kernel-level profiling with [AMD Omniperf](https://rocm.github.io/omniperf/introduction.html#what-is-omniperf).
107
+
When used, `--omni` will output profiling information for all subroutines, including rooflines, cache usage, register usage, and more, after the simulation is run.
107
108
Adding this argument will moderately slow down the simulation and run the MFC executable several times.
108
-
For this reason it should only be used with case files that have a few timesteps.
109
+
For this reason, it should only be used with case files with few timesteps.
109
110
110
111
111
112
### Restarting Cases
@@ -123,7 +124,7 @@ If you want to restart a simulation,
123
124
in which $t_i$ is the starting time, $t_f$ is the final time, and $SF$ is the saving frequency time.
124
125
- Run `pre_process` and `simulation` on the case.
125
126
-`./mfc.sh run case.py -t pre_process simulation `
126
-
- As the simulation runs, it will create Lustre files for each saved timestep in `./restart_data`.
127
+
- As the simulation runs, Lustre files will be created for each saved timestep in `./restart_data`.
127
128
- When the simulation stops, choose any Lustre file as the restarting point (lustre_ $t_s$.dat)
128
129
- Create a new duplicate input file (e.g., `restart_case.py`), which should have:
0 commit comments