Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #10842 - When using pyenergyplus to run (via run_energyplus), PythonPlugin initializations errors lead to hang #10844

Merged
merged 5 commits into from
Dec 5, 2024

Conversation

jmarrec
Copy link
Contributor

@jmarrec jmarrec commented Dec 3, 2024

Pull request overview

Pull Request Author

Add to this list or remove from it as applicable. This is a simple templated set of guidelines.

  • Title of PR should be user-synopsis style (clearly understandable in a standalone changelog context)
  • Label the PR with at least one of: Defect, Refactoring, NewFeature, Performance, and/or DoNoPublish
  • Pull requests that impact EnergyPlus code must also include unit tests to cover enhancement or defect repair
  • Author should provide a "walkthrough" of relevant code changes using a GitHub code review comment process
  • If any diffs are expected, author must demonstrate they are justified using plots and descriptions
  • If changes fix a defect, the fix should be demonstrated in plots and descriptions
  • If any defect files are updated to a more recent version, upload new versions here or on DevSupport
  • If IDD requires transition, transition source, rules, ExpandObjects, and IDFs must be updated, and add IDDChange label
  • If structural output changes, add to output rules file and add OutputChange label
  • If adding/removing any LaTeX docs or figures, update that document's CMakeLists file dependencies

Reviewer

This will not be exhaustively relevant to every PR.

  • Perform a Code Review on GitHub
  • If branch is behind develop, merge develop and build locally to check for side effects of the merge
  • If defect, verify by running develop branch and reproducing defect, then running PR and reproducing fix
  • If feature, test running new feature, try creative ways to break it
  • CI status: all green or justified
  • Check that performance is not impacted (CI Linux results include performance check)
  • Run Unit Test(s) locally
  • Check any new function arguments for performance impacts
  • Verify IDF naming conventions and styles, memos and notes and defaults
  • If new idf included, locally check the err file and other outputs

@jmarrec jmarrec added Defect Includes code to repair a defect in EnergyPlus APIChange Code changes impact the Python Plugins, C API or Python API Bindings labels Dec 3, 2024
Copy link
Contributor Author

@jmarrec jmarrec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for the Test part.

On 0c73d29, before fix (you can also use the MCVE on https://github.com/NREL/EnergyPlusDevSupport/commit/4a6027ab293f4d697b60a71e106aa7a83e4a89d3 )

$ ctest -R 'TestRuntimeReleasesTheGIL' -VV

22: Test command: /home/julien/.pyenv/versions/3.12.2/bin/python3.12 "/home/julien/Software/Others/EnergyPlus/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL.py" "-d" "/home/julien/Software/Others/EnergyPlus-build/tst/api/TestRuntimeReleasesTheGIL" "-w" "/home/julien/Software/Others/EnergyPlus/weather/USA_IL_Chicago-OHare.Intl.AP.725300_TMY3.epw" "-D" "/home/julien/Software/Others/EnergyPlus/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL/mcve_gil.idf"
22: Working Directory: /home/julien/Software/Others/EnergyPlus-build/src/EnergyPlus
22: Environment variables: 
22:  PYTHONPATH=/home/julien/Software/Others/EnergyPlus-build/Products
22: Test timeout computed to be: 10
22: EnergyPlus Starting
22: EnergyPlus, Version 25.1.0-a119feb883, YMD=2024.12.03 12:50
22: **FATAL:Python import error causes program termination
22: EnergyPlus Run Time=00hr 00min  2.65sec
22: Program terminated: EnergyPlus Terminated--Error(s) Detected.
1/1 Test #22: API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL ...***Timeout  10.06 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =  10.16 sec

The following tests FAILED:
	 22 - API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL (Timeout)

Ok: gcc on decent ci also timeouts: https://raw.githubusercontent.com/Myoldmopar/EnergyPlusBuildResults/eb8127d16465353e3137023c1a6d27a405cfe69d/_posts/10842-PH/2024-12-03-EnergyPlus-0c73d2989ba393ab72d8ead77c2651f4ad3e941f-x86_64-Linux-Ubuntu-24.04-gcc-13.2-results.html

# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.

from this_does_not_exist.hello_world import garbage
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken Python Plugin import

Comment on lines +98 to +102
PythonPlugin:Instance,
Heating Setpoint Override, !- Name
Yes, !- Run During Warmup Days
mcve_gil, !- Python Module Name
HeatingSetPoint; !- Plugin Class Name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCVE file that has nothing but this plugin, the required objects (Building, GlobalGeometry Rules) and design days.

Comment on lines +60 to +66
api = EnergyPlusAPI()
state = api.state_manager.new_state()
exit_code = api.runtime.run_energyplus(state, sys.argv[1:])

if exit_code == 0:
print("Expected EnergyPlus to return an error to the broken python plugin. Task Succeeded Unsuccessfully!")
sys.exit(1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run that file idf file with python plugin from API.

In the current state, the exit_code is never returned anyways because it just hangs. But testing that it did return an error is what we eventually want

Comment on lines +1151 to +1160
set(TEST_DIR "${PROJECT_BINARY_DIR}/tst/api/TestRuntimeReleasesTheGIL")
file(MAKE_DIRECTORY ${TEST_DIR})
add_test(NAME "API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL"
COMMAND "${Python_EXECUTABLE}" "${PROJECT_SOURCE_DIR}/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL.py" -d "${TEST_DIR}" -w "${EPW_FILE}" -D "${PROJECT_SOURCE_DIR}/tst/EnergyPlus/api/TestRuntimeReleasesTheGIL/mcve_gil.idf"
)
set_tests_properties("API.Runtime.PythonPlugin.TestRuntimeReleasesTheGIL"
PROPERTIES
ENVIRONMENT PYTHONPATH=${DIR_WITH_PY_ENERGYPLUS}
TIMEOUT 10 # This used to timeout! and we expect it NOT to
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Register the ctest. NOTE THE TIMEOUT 10 (seconds), we do expect it to timeout to demonstrate #10842

Comment on lines 487 to 503
// GilGrabber is an RAII helper that will ensure we release the GIL (including if we end up throwing)
struct GilGrabber
{
GilGrabber()
{
gil = PyGILState_Ensure();
}
~GilGrabber()
{
PyGILState_Release(gil);
}

PyGILState_STATE gil;
};

// Take control of the global interpreter lock
GilGrabber gil_grabber;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the issue was that we were NOT releasing the Python Global Interpreter Lock after acquiring it. (I tested a C version of the same test, which was correctly throwing and exiting as expected, so I figured it was the GIL)

Initially in 552f5dd I put a try / catch around plugin.setup()

    for (auto &plugin : state.dataPluginManager->plugins) {
+        try {
            plugin.setup(state);
+        } catch (const FatalError &e) {
+            PyGILState_Release(gil);
+            throw e;
 +       }
    }

But I realized it was probably a better idea to just use RAII, so that there's no chance we forget to release the Python GIL

Comment on lines +697 to +698
// Release the global interpreter lock is done via RAII

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to release the GIL manually anymore in the case it works fine either

@TShapinsky
Copy link
Member

@jmarrec This is my test suite of osws that fail in each place that energyplus contacts a pythonplugin. Hopefully this can be useful in helping diagnose this issue.
pyenergyplus testsuite.zip

@Myoldmopar Myoldmopar self-assigned this Dec 4, 2024
Copy link
Member

@Myoldmopar Myoldmopar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice. A sleek fix, RAII working great, excellent test, all good.

@Myoldmopar
Copy link
Member

And confirmed it's happy with develop pulled in. Merging, thanks @jmarrec

@Myoldmopar Myoldmopar merged commit acda84e into develop Dec 5, 2024
9 checks passed
@Myoldmopar Myoldmopar deleted the 10842_PluginHangs branch December 5, 2024 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APIChange Code changes impact the Python Plugins, C API or Python API Bindings Defect Includes code to repair a defect in EnergyPlus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When using pyenergyplus to run (via run_energyplus), PythonPlugin initializations errors lead to hang
4 participants