Skip to content

LTO and PGO Possibilities

lefticus edited this page Nov 10, 2014 · 4 revisions

Enhanced Compiler Optimization Options

These compiler optimizations are distinct and can be applied in combination on my platforms.

  • Link-Time-Optimization (LTO) - Optimizations performed during the linking phase of the compilation process.

    Normally the linker simply assembles object files into an executable or library. With LTO, the linker looks for optimizations in how code from one object file calls code in another object file. This requires additional flags to be passed at compile time and causes a significant slowdown during the link phase of compilation.

  • Profiler Guided Optimization (PGO) - Optimizations applied in response to real-world application data.

    The optimizer typically has to make guesses about which code paths are most likely to be executed. With Profiler Guided Optimization, the compiler is fed data about real-world typical application usage and uses that to tune the optimized code.

    NOTE: Profiler Guided Optimization essentially requires compiling the application 2x, once to generate profiler output, once to feed the profiler output back into the optimizer. The normal workflow is:

    1. Compile with profile generating options enabled
    2. Execute test suite of typical code
    3. Recompile with profile using options enabled

Using our current automated build process with CMake, it is possible to have automated release builds that go through the entire Profile Guided Optimization process automatically. The only difficulty is determining which builds should be PGO enabled, due to the significant compile time increase.

Summary

GCC on Linux

  • Link-Time-Optimization 10-20% improvement (longer the simulation, better improvement)
  • Profiler Guided Optimization ~10%
  • Combined - ~21% (~2 hours to build to get the entire benefit)

Clang on MacOS

The compiler provided by Apple (based on clang 3.4) does not have the correct options to support Profiler Guilded Optimizations, nor the required tools for processing of the profile data. Therefor, some of the tests were conducted using the official build of clang 3.5 for MacOS.

  • Link-Time-Optimization ~6% improvement (Stock 3.4)

    Note: the -flto compiler flag which is used is basically undocumented, but it seems to work.

  • Profiler Guided Optimization - Not yet tested

  • Combined - ~15% speed up. (Official 3.5)

MSVC on Windows

  • Link-Time-Optimization ~3% improvement (~1.5 hrs to build)
  • Profiler Guided Optimization - ~18% improvement (~2.5hrs round trip build time)
  • Combined - Indistinguishable from Profiler Guided Optimization, PGO implies LTO

Caveats and Potential Problems

PGO Optimizes for Frequently Called Code

One aspect of PGO is that it moves infrequently used code out of the core application, so that frequently used code resides closer together. It's possible that we might choose profiling data that would de-optimize corner cases, and code that was not exercised during the profiling phase. This may be acceptable behavior, but it might also cause unacceptable slowdowns for untested cases.

However, I have not seen an actual problem with this.

Build Layout Changes

To get the full advantage of LTO on Unix, we needed to modify the dependent library builds (DELight, expat, sqlite, etc) to be statically linked. This actually results in a smaller distribution and more portability of the EnergyPlus binary itself.

MSVC's linker is sensitive to the size of the object it is creating. With LTO enabled the EnergyPlusLib static library was growing too large for the linker to handle. Because of this we had to arbitrarily split the EnergyPlusLib library into 2 halves. Ultimately this will probably result in faster compiler times for the average user, since linking should use less memory during normal builds. However, the structure is slightly odd in the CMakeLists.txt file, and it is non-obvious which library should contain a newly added file (it doesn't matter, either will work equally well).

Linux

To get the full performance advantage of LTO on Linux, the EnergyPlus executable must be statically linked to EnergyPlusLib. The result is that the libEnergyPlusAPI.so on Linux is a second class citizen, only able to optimize for PGO, and not get the advantages of LTO.

CMake Options and Build Workflow

Baseline

  1. Set build configuration to Release
  2. Choose TEST_ANNUAL_SIMULATIONS (or not)
  3. Compile normally

Link Time Optimizations

  1. Set build configuration to Release
  2. Choose TEST_ANNUAL_SIMULATIONS (or not)
  3. Choose ENABLE_LTO
  4. Compile normally

Profile Guided Optimizations

*Do not use ENABLE_LTO on the first build, it's a waste of compile time, the profiler doesn't need it.`

  1. Set build configuration to Release
  2. Choose TEST_ANNUAL_SIMULATIONS (or not)
  3. Choose PROFILE_GENERATE
  4. Compile normally
 **MacOS Only** `DYLD_LIBRARY_PATH=~/clang+llvm-3.5.0-macosx-apple-darwin/lib make -jN`
  1. Execute some or all of the tests:
 ```sh
 ctest -C Release -R "integration.*Zone.*" # For example
 ```
 **MacOS Only** `find . -name "default.profraw" | xargs ~/clang+llvm-3.5.0-macosx-apple-darwin/bin/llvm-profdata merge -output profdata`
  1. Using cmake again, de-select PROFILE_GENERATE and select PROFILE_USE
  2. Compile normally
 **MacOS Only** `DYLD_LIBRARY_PATH=~/clang+llvm-3.5.0-macosx-apple-darwin/lib make -jN`

Both PGO and LTO

  1. Set build configuration to Release
  2. Choose TEST_ANNUAL_SIMULATIONS (or not)
  3. Choose PROFILE_GENERATE
  4. Compile normally
 **MacOS Only** `DYLD_LIBRARY_PATH=~/clang+llvm-3.5.0-macosx-apple-darwin/lib make -jN`
  1. Execute some or all of the tests:
 ```sh
 ctest -C Release -R "integration.*Zone.*" # For example
 ```
 
 **MacOS Only** `find . -name "default.profraw" | xargs ~/clang+llvm-3.5.0-macosx-apple-darwin/bin/llvm-profdata merge -output profdata`
  1. Using cmake again, de-select PROFILE_GENERATE and select PROFILE_USE
  2. Select ENABLE_LTO
  3. Compile normally
 **MacOS Only** `DYLD_LIBRARY_PATH=~/clang+llvm-3.5.0-macosx-apple-darwin/lib make -jN`

Raw Data

To build with LTO on clang3.5 on MacOS, you need to point the linker to the proper libLTO.dylib

DYLD_LIBRARY_PATH=/Users/jason/clang+llvm-3.5.0-macosx-apple-darwin/lib make -j3

GCC On Linux

LTO + RGO Full Annual

jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed   87.15 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed   53.14 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed   37.94 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) = 178.27 sec

LTO + RGO

jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed    6.04 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed    4.31 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed    3.21 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =  13.60 sec

LTO Only, Full Annual

jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed   94.71 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed   56.80 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed   41.44 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) = 192.98 sec

LTO Only

jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed    6.37 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed    4.42 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed    3.27 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =  14.10 sec

RGO Only, Full Annual

jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed  109.47 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed   64.35 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed   47.01 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) = 220.87 sec

RGO Only

jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed    7.22 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed    4.91 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed    3.64 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =  15.80 sec

Baseline Full Annual

jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed  112.72 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed   66.52 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed   48.61 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) = 227.89 sec

Baseline

jason@jason-VirtualBox:~/EnergyPlusTeam-build-2$ ctest -R "integration.RefBldg.*Office"
Test project /home/jason/EnergyPlusTeam-build-2
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed    7.61 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed    4.99 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed    3.79 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =  16.43 sec

Clang on MacOS

LTO + PGO Full Annual (Clang 3.5)

Jasons-Mac:EnergyPlus-build jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed  152.24 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed   94.92 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed   68.31 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) = 315.58 sec

LTO + PGO (Clang 3.5)

Jasons-Mac:EnergyPlus-build jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed   15.92 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed   11.78 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed    8.32 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =  36.19 sec

PGO Full Annual (Clang 3.5)

Jasons-Mac:EnergyPlus-build jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed  160.22 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed  101.19 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed   70.35 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) = 331.86 sec

PGO (Clang 3.5)

Jasons-Mac:EnergyPlus-build jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed   11.69 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed    8.37 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed    6.45 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =  26.65 sec

Baseline, Full Annual (Clang 3.4)

Jasons-Mac:EnergyPlus-build-nolto jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build-nolto
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed  180.88 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed  110.30 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed   80.58 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) = 371.89 sec

#### Baseline (Clang 3.4)

```sh
Jasons-Mac:EnergyPlus-build-nolto jason$ ctest -R "integration.RefBldg.*Office"
Test project /Users/jason/EnergyPlus-build-nolto
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed   12.51 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed    9.14 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed    7.74 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =  29.50 sec

MSVC On Windows

LTO + PGO, Full Annual

$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed   96.31 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed   58.45 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed   42.23 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) = 197.61 sec

LTO + PGO

$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed    7.55 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed    5.72 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed    4.42 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =  18.28 sec

Baseline, Full Annual

$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build-nolto
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed  118.45 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed   70.14 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed   53.34 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) = 242.53 sec

Baseline

$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build-nolto
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed    8.78 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed    6.36 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed    5.03 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =  20.75 sec

LTO, Full Annual

$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed  115.89 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed   69.41 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed   52.86 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) = 238.73 sec

LTO

$ ctest -R "integration.RefBldg.*Office" -C Release
Test project C:/Programming/EnergyPlusTeam-build
    Start 899: integration.RefBldgLargeOfficeNew2004_Chicago
1/3 Test #899: integration.RefBldgLargeOfficeNew2004_Chicago ....   Passed    8.58 sec
    Start 900: integration.RefBldgMediumOfficeNew2004_Chicago
2/3 Test #900: integration.RefBldgMediumOfficeNew2004_Chicago ...   Passed    6.20 sec
    Start 907: integration.RefBldgSmallOfficeNew2004_Chicago
3/3 Test #907: integration.RefBldgSmallOfficeNew2004_Chicago ....   Passed    4.98 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =  20.38 sec
Clone this wiki locally