-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R runtime shutdown timeouts during smoke tests #5334
Comments
This is a **speculative** change that attempts to address an issue seen sporadically in CI. Specifically, one or both of these: #5285 #5334 I have not been able to reproduce either locally. However, checking the Playwright traces, it looks as though some parts of the system know that the kernel has shut down, while others are still waiting. For example, you can see here that the console knows that the shutdown is finished, but another part of the system thinks it's still shutting down. <img width="464" alt="image" src="https://github.com/user-attachments/assets/be74a7f0-31d6-4dc3-94a2-336877b73366"> This is probably related to some state management warts around runtime exits, which have two different components: an `Exited` status event that merely marks the runtime as _Exited_, and an `onDidEndSession` event that carries additional metadata around the session exit, such as its exit code, any errors or messages associated with the exit, etc. These events are currently not guaranteed to arrive in any particular order, and some parts of the system only pay attention to one of them depending on whether they need the additional metadata. My theory for these specific bugs is that we are getting into a race condition w/r/t the ordering of these events when waiting for shutdown to finish; the speculative fix is to ensure that -- for the purposes of waiting for a session shutdown to complete -- a `onDidEndSession` is as good as an `Exited` status. It would be good to refactor the way exits are handled in the system so that it is not necessary to check both events here, but that's a larger change than I want to make speculatively. Here's a smoke test run that shows the tests mentioned in the above two issues are passing with this change: https://github.com/posit-dev/positron/actions/runs/11807687400/job/32896351530 (not a guarantee that the change is correct since the failures are intermittent)
We aren't seeing this in CI test now (after merge) |
We still seeing this issue.. |
Note that this can happen with Python, too: https://github.com/posit-dev/positron/actions/runs/11906025901/job/33177443139 |
@petetronic Yes, I think this issue may be a duplicate of #4043 which I'm hoping to fix as part of the notebook session stability work. |
Verified Fixed
Test scenario(s)Have not seen this in automation in a while Link(s) to TestRail test cases run or created: |
System details:
Positron and OS details:
Main on 11/12/24
Ubuntu
Interpreter details:
R 4.4.0
Describe the issue:
Shortly after notebook creation and entry of first cell contents:
Test run archive:
notebookCreate.test.js.zip
CI job:
https://github.com/posit-dev/positron/actions/runs/11790508005/job/32841062065
Steps to reproduce the issue:
This is currently only reproing in the Notebook Create - R smoke test. It has happened about three times recently:
https://github.com/posit-dev/positron/actions/runs/11696991953/job/32574999492
https://github.com/posit-dev/positron/actions/runs/11716383905/attempts/1
https://github.com/posit-dev/positron/actions/runs/11790508005/job/32841062065
Expected or desired behavior:
Normal notebook operation
Were there any error messages in the UI, Output panel, or Developer Tools console?
ERR NO notebook document for 'untitled:Untitled-1.ipynb?jupyter-notebook': Error: NO notebook document for 'untitled:Untitled-1.ipynb?jupyter-notebook'
at ExtHostNotebookController.getNotebookDocument (/home/runner/work/positron/positron/out/vs/workbench/api/common/extHostNotebook.js:114:23)
at ExtHostNotebookKernels.$cancelCells (/home/runner/work/positron/positron/out/vs/workbench/api/common/extHostNotebookKernels.js:381:52)
at RPCProtocol._doInvokeHandler (/home/runner/work/positron/positron/out/vs/workbench/services/extensions/common/rpcProtocol.js:425:27)
at RPCProtocol._invokeHandler (/home/runner/work/positron/positron/out/vs/workbench/services/extensions/common/rpcProtocol.js:410:45)
at RPCProtocol._receiveRequest (/home/runner/work/positron/positron/out/vs/workbench/services/extensions/common/rpcProtocol.js:351:32)
at RPCProtocol._receiveOneMessage (/home/runner/work/positron/positron/out/vs/workbench/services/extensions/common/rpcProtocol.js:278:26)
at UniqueContainer.value (/home/runner/work/positron/positron/out/vs/workbench/services/extensions/common/rpcProtocol.js:141:67)
at Emitter._deliver (/home/runner/work/positron/positron/out/vs/base/common/event.js:967:26)
at Emitter.fire (/home/runner/work/positron/positron/out/vs/base/common/event.js:996:22)
at BufferedEmitter.fire (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:486:35)
at UniqueContainer.value (/home/runner/work/positron/positron/out/vs/workbench/api/node/extensionHostProcess.js:228:41)
at Emitter._deliver (/home/runner/work/positron/positron/out/vs/base/common/event.js:967:26)
at Emitter.fire (/home/runner/work/positron/positron/out/vs/base/common/event.js:996:22)
at BufferedEmitter.fire (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:486:35)
at PersistentProtocol._receiveMessage (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:757:45)
at UniqueContainer.value (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:632:82)
at Emitter._deliver (/home/runner/work/positron/positron/out/vs/base/common/event.js:967:26)
at Emitter.fire (/home/runner/work/positron/positron/out/vs/base/common/event.js:996:22)
at ProtocolReader.acceptChunk (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:275:37)
at UniqueContainer.value (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:242:61)
at Emitter._deliver (/home/runner/work/positron/positron/out/vs/base/common/event.js:967:26)
at Emitter.fire (/home/runner/work/positron/positron/out/vs/base/common/event.js:996:22)
at WebSocketFlowManager._processReadQueue (/home/runner/work/positron/positron/out/vs/base/parts/ipc/node/ipc.net.js:519:34)
The text was updated successfully, but these errors were encountered: