-
Notifications
You must be signed in to change notification settings - Fork 396
LTO and PGO Possibilities
These compiler optimizations are distinct and can be applied in combination on my platforms. Build flags are prototyped on github.
-
Link-Time-Optimization (LTO) - Optimizations performed during the linking phase of the compilation process.
Normally the linker simply assembles object files into an executable or library. With LTO, the linker looks for optimizations in how code from one object file calls code in another object file. This requires additional flags to be passed at compile time and causes a significant slowdown during the link phase of compilation.
-
Profiler Guided Optimization (PGO) - Optimizations applied in response to real-world application data.
The optimizer typically has to make guesses about which code paths are most likely to be executed. With Profiler Guided Optimization, the compiler is fed data about real-world typical application usage and uses that to tune the optimized code.
NOTE: Profiler Guided Optimization essentially requires compiling the application 2x, once to generate profiler output, once to feed the profiler output back into the optimizer. The normal workflow is:
- Compile with profile generating options enabled
- Execute test suite of typical code
- Recompile with profile using options enabled
Using our current automated build process with CMake, it is possible to have automated release builds that go through the entire Profile Guided Optimization process automatically. The only difficulty is determining which builds should be PGO enabled, due to the significant compile time increase.
- Link-Time-Optimization 10-20% improvement (longer the simulation, better improvement)
- Profiler Guided Optimization ~10%
- Combined - ~21% (~2 hours to build to get the entire benefit)
The compiler provided by Apple (based on clang 3.4) does not have the correct options to support Profiler Guilded Optimizations, nor the required tools for processing of the profile data. Therefor, some of the tests were conducted using the official build of clang 3.5 for MacOS.
-
Link-Time-Optimization ~6% improvement (Stock 3.4)
Note: the
-flto
compiler flag which is used is basically undocumented, but it seems to work. -
Profiler Guided Optimization - Not yet tested
-
Combined - ~15% speed up. (Official 3.5)
- Link-Time-Optimization ~3% improvement (~1.5 hrs to build)
- Profiler Guided Optimization - ~18% improvement (~2.5hrs round trip build time)
- Combined - Indistinguishable from Profiler Guided Optimization, PGO implies LTO
One aspect of PGO is that it moves infrequently used code out of the core application, so that frequently used code resides closer together. It's possible that we might choose profiling data that would de-optimize corner cases, and code that was not exercised during the profiling phase. This may be acceptable behavior, but it might also cause unacceptable slowdowns for untested cases.
However, I have not seen an actual problem with this.
To get the full advantage of LTO on Unix, we needed to modify the dependent library builds (DELight, expat, sqlite, etc) to be statically linked. This actually results in a smaller distribution and more portability of the EnergyPlus binary itself.
MSVC's linker is sensitive to the size of the object it is creating. With LTO enabled the EnergyPlusLib static library was growing too large for the linker to handle. Because of this we had to arbitrarily split the EnergyPlusLib library into 2 halves. Ultimately this will probably result in faster compiler times for the average user, since linking should use less memory during normal builds. However, the structure is slightly odd in the CMakeLists.txt file, and it is non-obvious which library should contain a newly added file (it doesn't matter, either will work equally well).
To get the full performance advantage of LTO on Linux, the EnergyPlus executable must be statically linked to EnergyPlusLib. The result is that the libEnergyPlusAPI.so
on Linux is a second class citizen, only able to optimize for PGO, and not get the advantages of LTO.
To get the best results on MacOS, we must use an after market compiler. This does not introduce any runtime dependencies, only compile time. Also, the PGO process currently requires a human step to merge the profile data before the final build is completed on MacOS / Clang builds.
- Set build configuration to
Release
- Choose
TEST_ANNUAL_SIMULATIONS
(or not) - Compile normally
- Set build configuration to
Release
- Choose
TEST_ANNUAL_SIMULATIONS
(or not) - Choose
ENABLE_LTO
- Compile normally
*Do not use ENABLE_LTO
on the first build, it's a waste of compile time, the profiler doesn't need it.`
- Set build configuration to
Release
- Choose
TEST_ANNUAL_SIMULATIONS
(or not) - Choose
PROFILE_GENERATE
- Compile normally
**MacOS Only** `DYLD_LIBRARY_PATH=~/clang+llvm-3.5.0-macosx-apple-darwin/lib make -jN`
- Execute some or all of the tests:
```sh
ctest -C Release -R "integration.*Zone.*" # For example
```
**MacOS Only** `find . -name "default.profraw" | xargs ~/clang+llvm-3.5.0-macosx-apple-darwin/bin/llvm-profdata merge -output profdata`
- Using cmake again, de-select
PROFILE_GENERATE
and selectPROFILE_USE
- Compile normally
**MacOS Only** `DYLD_LIBRARY_PATH=~/clang+llvm-3.5.0-macosx-apple-darwin/lib make -jN`
- Set build configuration to
Release
- Choose
TEST_ANNUAL_SIMULATIONS
(or not) - Choose
PROFILE_GENERATE
- Compile normally
**MacOS Only** `DYLD_LIBRARY_PATH=~/clang+llvm-3.5.0-macosx-apple-darwin/lib make -jN`
- Execute some or all of the tests:
```sh
ctest -C Release -R "integration.*Zone.*" # For example
```
**MacOS Only** `find . -name "default.profraw" | xargs ~/clang+llvm-3.5.0-macosx-apple-darwin/bin/llvm-profdata merge -output profdata`
- Using cmake again, de-select
PROFILE_GENERATE
and selectPROFILE_USE
- Select
ENABLE_LTO
- Compile normally
**MacOS Only** `DYLD_LIBRARY_PATH=~/clang+llvm-3.5.0-macosx-apple-darwin/lib make -jN`
To build with LTO on clang3.5 on MacOS, you need to point the linker to the proper libLTO.dylib
DYLD_LIBRARY_PATH=/Users/jason/clang+llvm-3.5.0-macosx-apple-darwin/lib make -j3
jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 87.15 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 53.14 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 37.94 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 178.27 sec
jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 6.04 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 4.31 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 3.21 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 13.60 sec
jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 94.71 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 56.80 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 41.44 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 192.98 sec
jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 6.37 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 4.42 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 3.27 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 14.10 sec
jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 109.47 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 64.35 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 47.01 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 220.87 sec
jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 7.22 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 4.91 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 3.64 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 15.80 sec
jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 112.72 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 66.52 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 48.61 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 227.89 sec
jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 7.61 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 4.99 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 3.79 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 16.43 sec
Jasons-Mac:EnergyPlus-build jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 152.24 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 94.92 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 68.31 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 315.58 sec
Jasons-Mac:EnergyPlus-build jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 15.92 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 11.78 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 8.32 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 36.19 sec
Jasons-Mac:EnergyPlus-build jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 160.22 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 101.19 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 70.35 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 331.86 sec
Jasons-Mac:EnergyPlus-build jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 11.69 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 8.37 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 6.45 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 26.65 sec
Jasons-Mac:EnergyPlus-build-nolto jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build-nolto
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 180.88 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 110.30 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 80.58 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 371.89 sec
#### Baseline (Clang 3.4)
```sh
Jasons-Mac:EnergyPlus-build-nolto jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build-nolto
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 12.51 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 9.14 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 7.74 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 29.50 sec
$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 96.31 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 58.45 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 42.23 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 197.61 sec
$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 7.55 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 5.72 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 4.42 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 18.28 sec
$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 115.89 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 69.41 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 52.86 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 238.73 sec
$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 8.58 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 6.20 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 4.98 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 20.38 sec
$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build-nolto
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 118.45 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 70.14 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 53.34 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 242.53 sec
$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build-nolto
Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago .... Passed 8.78 sec
Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ... Passed 6.36 sec
Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago .... Passed 5.03 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 20.75 sec