-
Notifications
You must be signed in to change notification settings - Fork 396
current perf
This page tells the story of the evolving performance of EnergyPlus.
Here we'll describe the internal performance characteristics of simulations at various commit points in the develop branch.
We can consider various models, especially those that highlight a feature of EnergyPlus that we're working to make faster.
The details included are profiler output and raw execution time.
This section describes the performance of the model prj10. This is a model that takes considerable time for simulation complete, and has many surfaces (60k). It is especially interesting with regard to interior radiant exchange and solar shading.
This version is where the following optimizations are applied to the initial C++ release:
- a rewrite of WriteSurfaceShadowing, using techniques of loop combining and eliminating redundant checks
- a rewrite of CalcScriptF, replacing the CalcMatrixInverse method for solving the linear system with an external library call (which uses an LU decomposition solving method)
- parallelization of CalcScriptF (including calls to CalcScriptF)
- some data structure optimization (for cache performance improvement) -- extracting certain data items from existing (large) structures to form smaller structures, and promoting others to their own arrays (i.e. AOS, or array of structures, to SOA, or structure of arrays)
Here's the gprof data:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
38.67 2290.72 2290.72 6847570739 0.00 0.00 ObjexxFCL::Fstring::has_prefix(ObjexxFCL::Fstring const&, bool) const
13.14 3069.16 778.44 2414833 0.00 0.00 EnergyPlus::HVACDXHeatPumpSystem::GetDXHeatPumpSystemInput()
9.95 3658.58 589.42 13105005332 0.00 0.00 EnergyPlus::SolarShading::ComputeIntSolarAbsorpFactors()
4.49 3924.61 266.03 2974154 0.00 0.00 EnergyPlus::SolarShading::DeterminePolygonOverlap(int, int, int)
4.11 4168.05 243.44 EnergyPlus::SolarShading::CTRANS(int, int, int&, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>)
3.53 4377.29 209.24 7918841234 0.00 0.00 ObjexxFCL::Fstring::lstrip_whitespace()
3.24 4569.48 192.19 703666 0.00 0.00 EnergyPlus::HVACStandAloneERV::GetStandAloneERV()
2.82 4736.80 167.32 7839344897 0.00 0.00 ObjexxFCL::Fstring::Fstring(unsigned long, ObjexxFCL::Fstring const&)
2.23 4868.90 132.10 3534730154 0.00 0.00 ObjexxFCL::operator<=(ObjexxFCL::Fstring const&, std::string const&)
1.33 4947.61 78.71 12656954825 0.00 0.00 ObjexxFCL::MArrayR<ObjexxFCL::FArray1<EnergyPlus::AirflowNetworkBalanceManager::AirflowNetworkReportVars>, double, 1>::~MArrayR()
1.32 5025.74 78.13 EnergyPlus::SolarShading::HTRANS(int, int, int)
0.98 5084.03 58.29 EnergyPlus::InputProcessor::GetNumSectionsinInput()
0.97 5141.56 57.53 3972206288 0.00 0.00 ObjexxFCL::FArray1D<EnergyPlus::IceThermalStorage::DetailedIceStorageData>::clear()
0.88 5193.97 52.41 14 0.00 0.01 EnergyPlus::InputProcessor::GetListofSectionsinInput(ObjexxFCL::FArray1S<ObjexxFCL::Fstring>, int&)
0.84 5243.52 49.55 ObjexxFCL::MArrayR<ObjexxFCL::FArray1<EnergyPlus::DataAirflowNetwork::AirflowNetworkLinkSimuData>, double, 1>::~MArrayR()
0.78 5289.47 45.95 3660157438 0.00 0.00 EnergyPlus::SolarShading::CLIP(int, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>)
0.71 5331.28 41.81 509168817 0.00 0.00 ObjexxFCL::operator<(ObjexxFCL::Fstring const&, std::string const&)
0.67 5370.98 39.70 18894861285 0.00 0.00 ObjexxFCL::FArray1D<int>::operator=(ObjexxFCL::FArray1D<int> const&)
0.65 5409.22 38.24 365951889 0.00 0.00 ObjexxFCL::FArray1D<EnergyPlus::SolarReflectionManager::SolReflRecSurfData>::clear()
0.60 5444.58 35.37 ObjexxFCL::MArrayR<ObjexxFCL::FArray1<EnergyPlus::DataAirflowNetwork::AirflowNetworkLinkReportData>, double, 1>::~MArrayR()
0.57 5478.33 33.75 3146933334 0.00 0.00 EnergyPlus::SolarShading::CHKGSS(int, int, double, bool&)
0.57 5511.92 33.59 3660161018 0.00 0.00 EnergyPlus::SolarShading::CalcInteriorSolarDistribution()
0.52 5542.61 30.69 1 0.03 2.27 EnergyPlus::SolarShadi
As you can see with this profile, the WriteSurfaceShadowing and CalcInteriorRadExchange components are reduced off the chart, with string operations and SolarShading dominating the execution time.
This gave a 1.57x improvement over the initial C++ code.
- Execution time: 12586.21 user 11.37 system 2:55:36 elapsed
This milestone is where std::string was introduced to replace the fixed length string objects.
Top of the gprof output file:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
25.36 1538.32 1538.32 1 1.54 1.54 EnergyPlus::OutputReportTabular::WriteSurfaceShadowing()
13.58 2361.93 823.61 320 0.00 0.00 EnergyPlus::HeatBalanceIntRadExchange::CalcMatrixInverse(ObjexxFCL::FArray2S<double>, ObjexxFCL::FArray2S<double>)
12.47 3118.25 756.32 4859 0.00 0.00 EnergyPlus::HeatBalanceIntRadExchange::CalcInteriorRadExchange(ObjexxFCL::FArray1S<double>, int, ObjexxFCL::FArray1S<double>, ObjexxFCL::Optional<int const, void>, ObjexxFCL::Optional<std::string, void>)
8.41 3628.06 509.81 112717320 0.00 0.00 ObjexxFCL::FArray1S<double>& ObjexxFCL::FArray1S<double>::operator-=<ObjexxFCL::FArray1D>(ObjexxFCL::FArray1D<double> const&)
7.67 4093.20 465.14 400 0.00 0.00 EnergyPlus::HeatBalanceIntRadExchange::CalcScriptF(int, ObjexxFCL::FArray1A<double>, ObjexxFCL::FArray2A<double>, ObjexxFCL::FArray1A<double>, ObjexxFCL::FArray2A<double>)
6.19 4468.64 375.44 3660157438 0.00 0.00 EnergyPlus::SolarShading::CLIPPOLY(int, int, int, int, int&)
4.24 4725.95 257.31 1861134 0.00 0.00 EnergyPlus::SolarShading::SHDGSS(int, int, int, int, int, int)
3.50 4938.40 212.45 3149207309 0.00 0.00 EnergyPlus::SolarShading::HTRANS1(int, int)
3.00 5120.31 181.91 3148800670 0.00 0.00 EnergyPlus::SolarShading::CTRANS(int, int, int&, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>)
3.00 5302.18 181.87 6806997353 0.00 0.00 EnergyPlus::SolarShading::HTRANS0(int, int)
1.95 5420.25 118.07 1158 0.00 0.00 EnergyPlus::CalcHeatBalanceInsideSurf(ObjexxFCL::Optional<int const, void>)
1.24 5495.30 75.05 299621174 0.00 0.00 EnergyPlus::HeatBalanceMovableInsulation::EvalInsideMovableInsulation(int, double&, double&)
1.17 5566.02 70.72 3146933334 0.00 0.00 EnergyPlus::SolarShading::CLIP(int, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>)
0.83 5616.27 50.25 978279264 0.00 0.00 EnergyPlus::InputProcessor::SameString(std::string const&, std::string const&)
0.64 5654.88 38.61 18887203420 0.00 0.00 ObjexxFCL::FArray1S<double>::FArray1S<ObjexxFCL::FArray1D>(ObjexxFCL::FArray1D<double> const&)
0.63 5692.99 38.11 3660161018 0.00 0.00 EnergyPlus::SolarShading::DeterminePolygonOverlap(int, int, int)
0.56 5727.20 34.21 365708252 0.00 0.00 EnergyPlus::SolarShading::CHKGSS(int, int, double, bool&)
0.51 5758.21 31.01 45850 0.00 0.00 EnergyPlus::InputProcessor::GetObjectItem(std::string const&, int, ObjexxFCL::FArray1S<std::string>, int&, ObjexxFCL::FArray1S<double>, int&, int&, ObjexxFCL::Optional<ObjexxFCL::FArray1<bool>, void>, ObjexxFCL::Optional<ObjexxFCL::FArray1<bool>, void>, ObjexxFCL::Optional<ObjexxFCL::FArray1<std::string>, void>, ObjexxFCL::Optional<ObjexxFCL::FArray1<std::string>, void>)
0.46 5786.10 27.89 1 0.03 0.06 EnergyPlus::SolarShading::DetermineShadowingCombinations()
0.44 5812.51 26.41 1158 0.00 0.00 EnergyPlus::HeatBalanceSurfaceManager::UpdateThermalHistories()
0.30 5830.94 18.43 1 0.02 0.06 EnergyPlus::SurfaceGeometry::GetSurfaceData(bool&)
0.25 5845.95 15.01 80 0.00 0.00 EnergyPlus::HeatBalanceIntRadExchange::FixViewFactors(int, ObjexxFCL::FArray1A<double>, ObjexxFCL::FArray2A<double>, int, double&, double&, double&, int&, double&)
Just scanning the grof output shows that the len_trim problem has gone away. This is more similar to the baseline now (profile wise and in execution time).
The improvement over the initial C++ version is 2.13x!
- Execution time: 2:09:39 real,7751.81 user, 6.80 sys
This is the first code from the C++ translation.
Top of the gprof output file:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
27.56 2549.74 2549.74 6841883584 0.00 0.00 ObjexxFCL::Fstring::len_trim() const
12.39 3695.44 1145.70 1 1.15 1.18 EnergyPlus::OutputReportTabular::WriteSurfaceShadowing()
8.90 4518.48 823.04 320 0.00 0.00 EnergyPlus::HeatBalanceIntRadExchange::CalcMatrixInverse(ObjexxFCL::FArray2S<double>, ObjexxFCL::FArray2S<double>)
8.45 5300.38 781.90 4859 0.00 0.00 EnergyPlus::HeatBalanceIntRadExchange::CalcInteriorRadExchange(ObjexxFCL::FArray1S<double>, int, ObjexxFCL::FArray1S<double>, ObjexxFCL::Optional<int const, void>, ObjexxFCL::Optional<ObjexxFCL::Fstring, void>)
5.53 5811.70 511.32 112717320 0.00 0.00 ObjexxFCL::FArray1S<double>& ObjexxFCL::FArray1S<double>::operator-=<ObjexxFCL::FArray1D>(ObjexxFCL::FArray1D<double> const&)
5.02 6275.70 464.00 400 0.00 0.00 EnergyPlus::HeatBalanceIntRadExchange::CalcScriptF(int, ObjexxFCL::FArray1A<double>, ObjexxFCL::FArray2A<double>, ObjexxFCL::FArray1A<double>, ObjexxFCL::FArray2A<double>)
4.07 6652.27 376.57 3660157438 0.00 0.00 EnergyPlus::SolarShading::CLIPPOLY(int, int, int, int, int&)
3.07 6935.94 283.67 1861134 0.00 0.00 EnergyPlus::SolarShading::SHDGSS(int, int, int, int, int, int)
2.41 7158.72 222.78 3149207309 0.00 0.00 EnergyPlus::SolarShading::HTRANS1(int, int)
2.19 7361.41 202.69 7918649184 0.00 0.00 ObjexxFCL::operator==(ObjexxFCL::Fstring const&, ObjexxFCL::Fstring const&)
2.16 7561.37 199.96 1158 0.00 0.00 EnergyPlus::CalcHeatBalanceInsideSurf(ObjexxFCL::Optional<int const, void>)
2.01 7747.50 186.13 3148800670 0.00 0.00 EnergyPlus::SolarShading::CTRANS(int, int, int&, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>, ObjexxFCL::FArray1S<double>)
1.92 7925.36 177.86 6806997353 0.00 0.00 EnergyPlus::SolarShading::HTRANS0(int, int)
1.89 8100.30 174.94 3954143273 0.00 0.00 EnergyPlus::InputProcessor::MakeUPPERCase(ObjexxFCL::Fstring const&)
1.80 8266.52 166.22 3528901221 0.00 0.00 ObjexxFCL::Fstring::operator()(unsigned long, unsigned long) const
1.61 8415.59 149.07 12650164270 0.00 0.00 ObjexxFCL::Fstring::~Fstring()
1.49 8553.05 137.46 7838745677 0.00 0.00 ObjexxFCL::Fstring::reassign(ObjexxFCL::Fstring const&)
0.65 8613.07 60.02 3146933334 0.00 0.00 EnergyPlus::SolarShading
With the profile data, you can see that it is very similar to the baseline (Fortran 8.0) but with a few extra things inserted. The major issue that appears is the first item in the flat profile, ObjexxFCL::Fstring::len_trim(). This demonstrates the problem of fixed length strings. It also had a performance impact in the Fortran version, because most operations on the fixed length strings required searching for their length (sans padding spaces).
You can also see that the execution time is over double that of the baseline.
- Execution time: 4:35:45 real,16486.17 user, 10.09 sys
This run serves as a performance baseline.
gprof dot call graph
Here are the top lines from the gprof flat profile:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
31.77 1685.43 1685.43 1 1.69 1.69 __outputreporttabular_MOD_writesurfaceshadowing
17.91 2635.64 950.21 320 0.00 0.00 __heatbalanceintradexchange_MOD_calcmatrixinverse
10.97 3217.37 581.73 4859 0.00 0.00 __heatbalanceintradexchange_MOD_calcinteriorradexchange
9.58 3725.51 508.14 3660157438 0.00 0.00 __solarshading_MOD_clippoly
5.15 3998.57 273.06 1861134 0.00 0.00 __solarshading_MOD_shdgss
3.79 4199.41 200.84 3148800670 0.00 0.00 __solarshading_MOD_ctrans
3.62 4391.70 192.29 1158 0.00 0.00 calcheatbalanceinsidesurf_
2.82 4541.42 149.72 3149207309 0.00 0.00 __solarshading_MOD_htrans1
2.66 4682.26 140.84 6806997353 0.00 0.00 __solarshading_MOD_htrans0
2.09 4793.23 110.97 2637023953 0.00 0.00 __inputprocessor_MOD_makeuppercase
1.22 4858.17 64.94 242717 0.00 0.00 __outputreportpredefined_MOD_incrementtableentry
1.20 4921.93 63.76 3146933334 0.00 0.00 __solarshading_MOD_clip
0.88 4968.41 46.48 1 0.05 0.08 __solarshading_MOD_determineshadowingcombinations
0.76 5008.89 40.48 3660161018 0.00 0.00 __solarshading_MOD_determinepolygonoverlap
You can see from the profile that the top processor cycle consumers are WriteSurfaceShadowing, CalcInteriorRadiantExchange (and its descendant in the call graph, CalcMatrixInverse, which is used in the routine CalcScriptF.
- Execution time: 2:01:54 real,7252.78 user, 30.16 sys