Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed GIL release issue with Python System and Python TestFixture. #2618

Open
wants to merge 8 commits into
base: gz-sim9
Choose a base branch
from

Conversation

AmalDevHaridevan
Copy link

…gned-off-by: Amal Dev Haridevan haridevanamaldev@gmail.com

🦟 Bug fix

Fixes #
Add explicit scoped acquire and release of GIL, so that Python Systems can be executed from a Python Server (run using TestFixture for example).

Summary

Python system plugin, which is attached to my model cannot be run using TestFixture in Python. This is due to GIL not being explicitly released in the PythonSystemLoader.
To overcome this issue, we need to do a scoped_acquire of GIL explicitly and then perform scoped_release after each of the system methods, namely, PreUpdate, Update, PostUpdate of the PythonSystem, so the GIL can be accessed by the TestFixture after.
As a safety mechanism, for future, I also added the scoped_acquire and release of GIL within the pybind code for TestFixture.

Checklist

  • Signed all commits for DCO
  • Added tests
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • Consider updating Python bindings (if the library has them)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

…gned-off-by: Amal Dev Haridevan haridevanamaldev@gmail.com

Signed-off-by: sdcnlab <haridevanamaldev@gmail.com>
Signed-off-by: Amal Dev Haridevan <haridevanamaldev@gmail.com>
python/test/python_system_TEST.py Outdated Show resolved Hide resolved
python/test/python_system_TEST.py Outdated Show resolved Hide resolved
python/test/gravity.sdf Show resolved Hide resolved
AmalDevHaridevan and others added 4 commits September 12, 2024 10:28
Co-authored-by: Alejandro Hernández Cordero <ahcorde@gmail.com>
Signed-off-by: AmaldevHaridevan <160554626+AmalDevHaridevan@users.noreply.github.com>
Signed-off-by: AmaldevHaridevan <160554626+AmalDevHaridevan@users.noreply.github.com>
Signed-off-by: AmaldevHaridevan <160554626+AmalDevHaridevan@users.noreply.github.com>
self->OnPreUpdate(_cb);
pybind11::gil_scoped_release gilr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? I thought the scoped lock above would automatically release when going out of scope

// This acquire and release is only required from the PythonSystem code
// However, adding this here may prevent undefined or unintended behaviors
// in future
pybind11::gil_scoped_acquire gil;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not run into this issue with our python systems in the examples. Are you doing something specific that's causing a problem? It would be great if we can pin down the actual problem

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi azeey,

Yes. I will explain the issue in detail. The following files from my branch are required to reproduce the issue:

  1. src/gz-sim/python/test/testFixture_TEST.Py
  2. src/gz-sim/python/test/python_system_TEST.py
  3. src/gz-sim/python/test/gravity.sdf

The goal is to run the Server in Python, with an SDF model that contains a Python system plugin. I executed the following command before my proposed modifications:
python testFixture_TEST.py and the output was as follows:

(2024-09-13 12:52:31.329) [info] [SystemManager.cc:54] Serving entity system service on [/entity/system/add]
(2024-09-13 12:52:31.400) [debug] [Physics.cc:877] Loaded [gz::physics::dartsim::Plugin] from library [/home/sdcnlab/gz_contrib/src/install/lib/gz-physics-8/engine-plugins/libgz-physics-dartsim-plugin.so]
(2024-09-13 12:52:31.400) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::Physics] for entity [1]
Configured on 4
sdf name: plugin
Applying 0.0 N on link link
(2024-09-13 12:52:31.426) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::PythonSystemLoader] for entity [4]
(2024-09-13 12:52:31.426) [info] [LevelManager.cc:641] Loaded level [default]
(2024-09-13 12:52:31.427) [debug] [ServerConfig.cc:1025] Loading (3) plugins from file [/home/sdcnlab/.gz/sim/9/server.config]
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1647] Additional plugins to load:
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1650] gz::sim::systems::UserCommands gz-sim-user-commands-system
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1650] gz::sim::systems::SceneBroadcaster gz-sim-scene-broadcaster-system
(2024-09-13 12:52:31.432) [info] [UserCommands.cc:664] Create service on [/world/gravity/create]
(2024-09-13 12:52:31.432) [info] [UserCommands.cc:671] Remove service on [/world/gravity/remove]
(2024-09-13 12:52:31.433) [info] [UserCommands.cc:678] Pose service on [/world/gravity/set_pose]
(2024-09-13 12:52:31.433) [info] [UserCommands.cc:686] Pose service on [/world/gravity/set_pose_vector]
(2024-09-13 12:52:31.434) [info] [UserCommands.cc:693] Light configuration service on [/world/gravity/light_config]
(2024-09-13 12:52:31.434) [info] [UserCommands.cc:719] Physics service on [/world/gravity/set_physics]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:727] SphericalCoordinates service on [/world/gravity/set_spherical_coordinates]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:736] Enable collision service on [/world/gravity/enable_collision]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:745] Disable collision service on [/world/gravity/disable_collision]
(2024-09-13 12:52:31.436) [info] [UserCommands.cc:754] Material service on [/world/gravity/visual_config]
(2024-09-13 12:52:31.436) [info] [UserCommands.cc:762] Material service on [/world/gravity/wheel_slip]
(2024-09-13 12:52:31.436) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::UserCommands] for entity [1]
(2024-09-13 12:52:31.438) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::SceneBroadcaster] for entity [1]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:260] Serving world controls on [/world/gravity/control], [/world/gravity/control/state] and [/world/gravity/playback/control]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:286] Serving GUI information on [/world/gravity/gui/info]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:289] World [gravity] initialized with [default_physics] physics profile.
(2024-09-13 12:52:31.441) [info] [SimulationRunner.cc:296] Serving world SDF generation service on [/world/gravity/generate_world_sdf]
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:334] Serving world names on [/gazebo/worlds]
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:347] Resource path add service on [/gazebo/resource_paths/add].
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:360] Resource path get service on [/gazebo/resource_paths/get].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:375] Resource path resolve service on [/gazebo/resource_paths/resolve].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:389] Resource paths published on [/gazebo/resource_paths].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:402] Server control service on [/server_control].
(2024-09-13 12:52:33.066) [info] [SimulationRunner.cc:743] Found no publishers on /stats, adding root stats topic
(2024-09-13 12:52:33.068) [info] [SimulationRunner.cc:777] Found no publishers on /clock, adding root clock topic
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:542] Creating PostUpdate worker threads: 4
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (0)
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (1)
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (2)
pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a method object.
E(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (1)
(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (2)
(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (0)

======================================================================
ERROR: test_test_fixture (__main__.TestTestFixture)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sdcnlab/gz_contrib/src/gz-sim/python/test/testFixture_TEST.py", line 58, in test_test_fixture
    server.run(True, 1000, False)
RuntimeError: pybind11::handle::inc_ref() PyGILState_Check() failure.

----------------------------------------------------------------------
Ran 1 test in 2.028s

FAILED (errors=1)

Specifically, the error is this:

pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a method object.

I wasn't sure how to debug this issue, however, I realized that a probable issue could be the Python system and the Python TestFixture are most likely trying to simultaneously execute.
Since GIL is required to run the Python interpreter, this could potentially cause this problem. Based on this assumption, I made the modification by explicitly acquiring and releasing GIL so that interpreter is not simultaneously used.

This is the result of running python testFixture_TEST.py after my proposed modifications:

(2024-09-13 13:03:42.006) [info] [Server.cc:145] Loading SDF world file[/home/sdcnlab/gz_contrib/src/gz-sim/python/test/gravity.sdf].
(2024-09-13 13:03:42.256) [info] [SystemManager.cc:54] Serving entity system service on [/entity/system/add]
(2024-09-13 13:03:42.313) [debug] [Physics.cc:877] Loaded [gz::physics::dartsim::Plugin] from library [/home/sdcnlab/gz_contrib/src/install/lib/gz-physics-8/engine-plugins/libgz-physics-dartsim-plugin.so]
(2024-09-13 13:03:42.314) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::Physics] for entity [1]
Configured on 4
sdf name: plugin
Applying 0.0 N on link link
(2024-09-13 13:03:42.338) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::PythonSystemLoader] for entity [4]
(2024-09-13 13:03:42.338) [info] [LevelManager.cc:641] Loaded level [default]
(2024-09-13 13:03:42.339) [debug] [ServerConfig.cc:1025] Loading (3) plugins from file [/home/sdcnlab/.gz/sim/9/server.config]
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1647] Additional plugins to load:
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1650] gz::sim::systems::UserCommands gz-sim-user-commands-system
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1650] gz::sim::systems::SceneBroadcaster gz-sim-scene-broadcaster-system
(2024-09-13 13:03:42.344) [info] [UserCommands.cc:664] Create service on [/world/gravity/create]
(2024-09-13 13:03:42.345) [info] [UserCommands.cc:671] Remove service on [/world/gravity/remove]
(2024-09-13 13:03:42.345) [info] [UserCommands.cc:678] Pose service on [/world/gravity/set_pose]
(2024-09-13 13:03:42.346) [info] [UserCommands.cc:686] Pose service on [/world/gravity/set_pose_vector]
(2024-09-13 13:03:42.346) [info] [UserCommands.cc:693] Light configuration service on [/world/gravity/light_config]
(2024-09-13 13:03:42.347) [info] [UserCommands.cc:719] Physics service on [/world/gravity/set_physics]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:727] SphericalCoordinates service on [/world/gravity/set_spherical_coordinates]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:736] Enable collision service on [/world/gravity/enable_collision]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:745] Disable collision service on [/world/gravity/disable_collision]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:754] Material service on [/world/gravity/visual_config]
(2024-09-13 13:03:42.349) [info] [UserCommands.cc:762] Material service on [/world/gravity/wheel_slip]
(2024-09-13 13:03:42.349) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::UserCommands] for entity [1]
(2024-09-13 13:03:42.351) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::SceneBroadcaster] for entity [1]
(2024-09-13 13:03:42.352) [info] [SimulationRunner.cc:260] Serving world controls on [/world/gravity/control], [/world/gravity/control/state] and [/world/gravity/playback/control]
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:286] Serving GUI information on [/world/gravity/gui/info]
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:289] World [gravity] initialized with [default_physics] physics profile.
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:296] Serving world SDF generation service on [/world/gravity/generate_world_sdf]
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:334] Serving world names on [/gazebo/worlds]
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:347] Resource path add service on [/gazebo/resource_paths/add].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:360] Resource path get service on [/gazebo/resource_paths/get].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:375] Resource path resolve service on [/gazebo/resource_paths/resolve].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:389] Resource paths published on [/gazebo/resource_paths].
(2024-09-13 13:03:42.355) [info] [ServerPrivate.cc:402] Server control service on [/server_control].
(2024-09-13 13:03:44.003) [info] [SimulationRunner.cc:743] Found no publishers on /stats, adding root stats topic
(2024-09-13 13:03:44.005) [info] [SimulationRunner.cc:777] Found no publishers on /clock, adding root clock topic
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:542] Creating PostUpdate worker threads: 4
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (0)
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (1)
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (2)
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:634] Serving scene information on [/world/gravity/scene/info]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:643] Serving graph information on [/world/gravity/scene/graph]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:654] Serving full state on [/world/gravity/state]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:663] Serving full state (async) on [/world/gravity/state_async]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:671] Publishing scene information on [/world/gravity/scene/info]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:680] Publishing entity deletions on [/world/gravity/scene/deletion]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:689] Publishing state changes on [/world/gravity/state]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:700] Publishing pose messages on [/world/gravity/pose/info]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:711] Publishing dynamic pose messages on [/world/gravity/dynamic_pose/info]
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (0)
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (1)
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (2)
.
----------------------------------------------------------------------
Ran 1 test in 3.165s

OK

@azeey
Copy link
Contributor

azeey commented Oct 14, 2024

@AmalDevHaridevan just wanted to give you a heads-up that I won't be able to review this for the next couple of weeks due to ROSCon.

@AmalDevHaridevan
Copy link
Author

@AmalDevHaridevan just wanted to give you a heads-up that I won't be able to review this for the next couple of weeks due to ROSCon.

Hi Azeey, any updates on the pull request?

Copy link
Contributor

@azeey azeey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience. I can see the issue now and I can reproduce it with

from gz_test_deps.sim import Server, ServerConfig
from gz_test_deps.common import set_verbosity

set_verbosity(4)
config  = ServerConfig()
config.set_sdf_file('gravity.sdf')
server = Server(config)
server.run(True, 1000, False)

So it has nothing to do with TestFixture, but with how Server is being loaded from python. As mentioned in

// Helper struct to maybe do a scoped release of the Python GIL. There are a
// number of ways where releasing and acquiring the GIL is not necessary:
// 1. Gazebo is built without Pybind11
// 2. The python interpreter is not initialized. This could happen in tests that
// create a SimulationRunner without sim::Server where the interpreter is
// intialized.
// 3. sim::Server was instantiated by a Python module. In this case, there's a
// chance that this would be called with the GIL already released.
struct MaybeGilScopedRelease

When Server is instantiated from a python interpreter, pybind11 automatically releases the GIL before calling the C++ code (Server()). When PythonSystemLoader then tries to load the python system, it fails since any access to Python state requires locking the GIL. The solution you proposed works pretty well I think. My only comment is that we limit the changes to PythonSystemLoader and acquire the GIL in every System interface call.

CallPythonMethod(this->preUpdateMethod, _info, &_ecm);
py::gil_scoped_release gilr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary since py::gil_scoped_acquire will release the gil when going out of scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the TestModelSystem plugin that's already in python/test/plugin?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need more py::gil_scoped_acquire gil; calls in PythonSystemLoader::Configure, PythonSystemLoader::Reset(), and PythonSystemLoader::~PythonSystemLoader(). Basically, all System interfaces since they will be accessing python state.

@@ -56,7 +56,14 @@ defineSimTestFixture(pybind11::object module)
[](TestFixture* self, std::function<void(
const UpdateInfo &, EntityComponentManager &)> _cb)
{
// Add explicit scoped acquire and release of GIL, so that Python Systems
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we apply the changes in PythonSystemLoader, we shouldn't need any change in this file. Can you revert?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏛️ ionic Gazebo Ionic
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

3 participants