-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Describe the feature request
If the user expects a calculation to take milliseconds but then it takes a couple minutes, then there's probably something wrong, and maybe they want to cancel the calculation.
This is especially useful for people running code in (jupyter) notebooks. A simple typo may cause the duration of a calculation to blow up significantly, especially since the introduction of multi-dimensional datasets in #1201. If a user realizes that, they may want to cancel the process as soon as they can without having to kill and restart the entire kernel.
This type of process cancellation typically is handled using KeyboardInterrupt (Ctrl+C events), signal handling (e.g. SIGINT), by setting a flag or by using stop tokens.
Considerations
-
Cfr. Python documentation (https://docs.python.org/3/library/signal.html#execution-of-python-signal-handlers):
A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.
This has implications:
- We need to double-check that sending an interrupt to the PGM C API as called from the same thread is possible. Note that this may be platform-dependent.
- If not, maybe we need to run long operations (PowerGridModel constructors, calculations and (de)serializations) in a separate worker thread from Python.
- We do not need to consider cancelability on the deepest level. It is fine to only check after every scenario whether a stop token was requested
- We need to double-check that sending an interrupt to the PGM C API as called from the same thread is possible. Note that this may be platform-dependent.
-
Also cfr. Python documentation (https://docs.python.org/3/library/exceptions.html#KeyboardInterrupt and https://docs.python.org/3/library/signal.html#note-on-signal-handlers-and-exceptions):
[...] applications that are complex or require high reliability should avoid raising exceptions from signal handlers. They should also avoid catching KeyboardInterrupt as a means of gracefully shutting down. Instead, they should install their own SIGINT handler.
- We should follow their recommendations.
-
XCode has supported
std::jthreadstarting with version 26 (cfr. https://developer.apple.com/documentation/xcode-release-notes/xcode-26-release-notes). Other compilers already have supported it for much longer. This means that we can finally use it. It comes with a built-in stop token feature, which enables simplified thread cancellation handling.
Design Proposal (requires experimentation before finalization)
- C++ core: support canceling multi-threaded calculations
- Shift to using
std::jthreadinstead ofstd::thread, which has a stop token, which allows cancelling threads - In the calculations, check for stop tokens in-between scenarios. NOTE: we do not need to check within a scenario cfr. the considerations mentioned before.
- Register a signal handler
- Raise a new exception for canceled operations (e.g.
OperationCanceled)
- Shift to using
- C API: add a stop-token feature.
- add a function
PGM_request_stop. - Handle the stop request, either by (TBD):
- Sending a signal to each thread; or
- Setting the stop token for each job that is in-progress for the current handle.
- An interrupted calculation should report a new exception type, e.g.
OperationCancelled
- add a function
- Python wrapper:
- DO NOT catch a
KeyboardInterrupt(cfr. https://docs.python.org/3/library/signal.html#note-on-signal-handlers-and-exceptions). - Instead, install a
SIGINTsignal handler as shown in the included example. This handler should - in turn - send another signal to a function that sets the stop token for all handles in the C API.- Note that this may require some changes to the way that handles are thread-local as introduced in [BUG] handle corruption in python multi-threads #1245
- An interrupted calculation should raise a new exception, e.g.
OperationCancelled
- DO NOT catch a
- Documentation: Aside from basic Python/C API reference, I do not believe we need any, as it is "intuitive" to try to cancel an operation
To test
- Start a notebook or interactive Python session.
- Create a very long-running PGM batch calculation (at least a couple seconds or minutes, but not hours; verify that indeed the calculation takes this long)
- Re-run the calculation, but this time, run Ctrl+C (or Delete); this should raise a
KeyboardInterrupt - The PGM calculation should abort quickly (potentially not immediately, but definitely faster than in the first run)
Originally posted by @mgovers in #1245 (comment)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status