-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #10842 - When using pyenergyplus to run (via run_energyplus), PythonPlugin initializations errors lead to hang #10844
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review for the Test part.
On 0c73d29, before fix (you can also use the MCVE on https://github.com/NREL/EnergyPlusDevSupport/commit/4a6027ab293f4d697b60a71e106aa7a83e4a89d3 )
$ ctest -R 'TestRuntimeReleasesTheGIL' -VV
22: Test command: /home/julien/.pyenv/versions/3.12.2/bin/python3.12 "/home/julien/Software/Others/EnergyPlus/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL.py" "-d" "/home/julien/Software/Others/EnergyPlus-build/tst/api/TestRuntimeReleasesTheGIL" "-w" "/home/julien/Software/Others/EnergyPlus/weather/USA_IL_Chicago-OHare.Intl.AP.725300_TMY3.epw" "-D" "/home/julien/Software/Others/EnergyPlus/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL/mcve_gil.idf"
22: Working Directory: /home/julien/Software/Others/EnergyPlus-build/src/EnergyPlus
22: Environment variables:
22: PYTHONPATH=/home/julien/Software/Others/EnergyPlus-build/Products
22: Test timeout computed to be: 10
22: EnergyPlus Starting
22: EnergyPlus, Version 25.1.0-a119feb883, YMD=2024.12.03 12:50
22: **FATAL:Python import error causes program termination
22: EnergyPlus Run Time=00hr 00min 2.65sec
22: Program terminated: EnergyPlus Terminated--Error(s) Detected.
1/1 Test #22: API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL ...***Timeout 10.06 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 10.16 sec
The following tests FAILED:
22 - API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL (Timeout)
Ok: gcc on decent ci also timeouts: https://raw.githubusercontent.com/Myoldmopar/EnergyPlusBuildResults/eb8127d16465353e3137023c1a6d27a405cfe69d/_posts/10842-PH/2024-12-03-EnergyPlus-0c73d2989ba393ab72d8ead77c2651f4ad3e941f-x86_64-Linux-Ubuntu-24.04-gcc-13.2-results.html
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
# POSSIBILITY OF SUCH DAMAGE. | ||
|
||
from this_does_not_exist.hello_world import garbage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken Python Plugin import
PythonPlugin:Instance, | ||
Heating Setpoint Override, !- Name | ||
Yes, !- Run During Warmup Days | ||
mcve_gil, !- Python Module Name | ||
HeatingSetPoint; !- Plugin Class Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MCVE file that has nothing but this plugin, the required objects (Building, GlobalGeometry Rules) and design days.
api = EnergyPlusAPI() | ||
state = api.state_manager.new_state() | ||
exit_code = api.runtime.run_energyplus(state, sys.argv[1:]) | ||
|
||
if exit_code == 0: | ||
print("Expected EnergyPlus to return an error to the broken python plugin. Task Succeeded Unsuccessfully!") | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run that file idf file with python plugin from API.
In the current state, the exit_code is never returned anyways because it just hangs. But testing that it did return an error is what we eventually want
set(TEST_DIR "${PROJECT_BINARY_DIR}/tst/api/TestRuntimeReleasesTheGIL") | ||
file(MAKE_DIRECTORY ${TEST_DIR}) | ||
add_test(NAME "API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL" | ||
COMMAND "${Python_EXECUTABLE}" "${PROJECT_SOURCE_DIR}/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL.py" -d "${TEST_DIR}" -w "${EPW_FILE}" -D "${PROJECT_SOURCE_DIR}/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL/mcve_gil.idf" | ||
) | ||
set_tests_properties("API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL" | ||
PROPERTIES | ||
ENVIRONMENT PYTHONPATH=${DIR_WITH_PY_ENERGYPLUS} | ||
TIMEOUT 10 # This used to timeout! and we expect it NOT to | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Register the ctest. NOTE THE TIMEOUT 10 (seconds), we do expect it to timeout to demonstrate #10842
src/EnergyPlus/PluginManager.cc
Outdated
// GilGrabber is an RAII helper that will ensure we release the GIL (including if we end up throwing) | ||
struct GilGrabber | ||
{ | ||
GilGrabber() | ||
{ | ||
gil = PyGILState_Ensure(); | ||
} | ||
~GilGrabber() | ||
{ | ||
PyGILState_Release(gil); | ||
} | ||
|
||
PyGILState_STATE gil; | ||
}; | ||
|
||
// Take control of the global interpreter lock | ||
GilGrabber gil_grabber; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the issue was that we were NOT releasing the Python Global Interpreter Lock after acquiring it. (I tested a C version of the same test, which was correctly throwing and exiting as expected, so I figured it was the GIL)
Initially in 552f5dd I put a try / catch around plugin.setup()
for (auto &plugin : state.dataPluginManager->plugins) {
+ try {
plugin.setup(state);
+ } catch (const FatalError &e) {
+ PyGILState_Release(gil);
+ throw e;
+ }
}
But I realized it was probably a better idea to just use RAII, so that there's no chance we forget to release the Python GIL
// Release the global interpreter lock is done via RAII | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to release the GIL manually anymore in the case it works fine either
@jmarrec This is my test suite of osws that fail in each place that energyplus contacts a pythonplugin. Hopefully this can be useful in helping diagnose this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice. A sleek fix, RAII working great, excellent test, all good.
And confirmed it's happy with develop pulled in. Merging, thanks @jmarrec |
Pull request overview
Pull Request Author
Add to this list or remove from it as applicable. This is a simple templated set of guidelines.
Reviewer
This will not be exhaustively relevant to every PR.