Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R runtime shutdown timeouts during smoke tests #5334

Closed
testlabauto opened this issue Nov 12, 2024 · 6 comments
Closed

R runtime shutdown timeouts during smoke tests #5334

testlabauto opened this issue Nov 12, 2024 · 6 comments
Labels
area: notebooks Issues related to Notebooks category. lang: r

Comments

@testlabauto
Copy link
Contributor

System details:

Positron and OS details:

Main on 11/12/24
Ubuntu

Interpreter details:

R 4.4.0

Describe the issue:

Shortly after notebook creation and entry of first cell contents:

Image

Test run archive:
notebookCreate.test.js.zip

CI job:
https://github.com/posit-dev/positron/actions/runs/11790508005/job/32841062065

Steps to reproduce the issue:

This is currently only reproing in the Notebook Create - R smoke test. It has happened about three times recently:

https://github.com/posit-dev/positron/actions/runs/11696991953/job/32574999492
https://github.com/posit-dev/positron/actions/runs/11716383905/attempts/1
https://github.com/posit-dev/positron/actions/runs/11790508005/job/32841062065

Expected or desired behavior:

Normal notebook operation

Were there any error messages in the UI, Output panel, or Developer Tools console?

ERR NO notebook document for 'untitled:Untitled-1.ipynb?jupyter-notebook': Error: NO notebook document for 'untitled:Untitled-1.ipynb?jupyter-notebook'
at ExtHostNotebookController.getNotebookDocument (/home/runner/work/positron/positron/out/vs/workbench/api/common/extHostNotebook.js:114:23)
at ExtHostNotebookKernels.$cancelCells (/home/runner/work/positron/positron/out/vs/workbench/api/common/extHostNotebookKernels.js:381:52)
at RPCProtocol._doInvokeHandler (/home/runner/work/positron/positron/out/vs/workbench/services/extensions/common/rpcProtocol.js:425:27)
at RPCProtocol._invokeHandler (/home/runner/work/positron/positron/out/vs/workbench/services/extensions/common/rpcProtocol.js:410:45)
at RPCProtocol._receiveRequest (/home/runner/work/positron/positron/out/vs/workbench/services/extensions/common/rpcProtocol.js:351:32)
at RPCProtocol._receiveOneMessage (/home/runner/work/positron/positron/out/vs/workbench/services/extensions/common/rpcProtocol.js:278:26)
at UniqueContainer.value (/home/runner/work/positron/positron/out/vs/workbench/services/extensions/common/rpcProtocol.js:141:67)
at Emitter._deliver (/home/runner/work/positron/positron/out/vs/base/common/event.js:967:26)
at Emitter.fire (/home/runner/work/positron/positron/out/vs/base/common/event.js:996:22)
at BufferedEmitter.fire (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:486:35)
at UniqueContainer.value (/home/runner/work/positron/positron/out/vs/workbench/api/node/extensionHostProcess.js:228:41)
at Emitter._deliver (/home/runner/work/positron/positron/out/vs/base/common/event.js:967:26)
at Emitter.fire (/home/runner/work/positron/positron/out/vs/base/common/event.js:996:22)
at BufferedEmitter.fire (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:486:35)
at PersistentProtocol._receiveMessage (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:757:45)
at UniqueContainer.value (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:632:82)
at Emitter._deliver (/home/runner/work/positron/positron/out/vs/base/common/event.js:967:26)
at Emitter.fire (/home/runner/work/positron/positron/out/vs/base/common/event.js:996:22)
at ProtocolReader.acceptChunk (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:275:37)
at UniqueContainer.value (/home/runner/work/positron/positron/out/vs/base/parts/ipc/common/ipc.net.js:242:61)
at Emitter._deliver (/home/runner/work/positron/positron/out/vs/base/common/event.js:967:26)
at Emitter.fire (/home/runner/work/positron/positron/out/vs/base/common/event.js:996:22)
at WebSocketFlowManager._processReadQueue (/home/runner/work/positron/positron/out/vs/base/parts/ipc/node/ipc.net.js:519:34)

@testlabauto testlabauto added area: notebooks Issues related to Notebooks category. lang: r labels Nov 12, 2024
@testlabauto testlabauto changed the title R runtime shutodown timeouts during smoke tests R runtime shutdown timeouts during smoke tests Nov 12, 2024
jmcphers added a commit that referenced this issue Nov 13, 2024
This is a **speculative** change that attempts to address an issue seen
sporadically in CI. Specifically, one or both of these:

#5285
#5334

I have not been able to reproduce either locally. However, checking the
Playwright traces, it looks as though some parts of the system know that
the kernel has shut down, while others are still waiting. For example,
you can see here that the console knows that the shutdown is finished,
but another part of the system thinks it's still shutting down.

<img width="464" alt="image"
src="https://github.com/user-attachments/assets/be74a7f0-31d6-4dc3-94a2-336877b73366">

This is probably related to some state management warts around runtime
exits, which have two different components: an `Exited` status event
that merely marks the runtime as _Exited_, and an `onDidEndSession`
event that carries additional metadata around the session exit, such as
its exit code, any errors or messages associated with the exit, etc.
These events are currently not guaranteed to arrive in any particular
order, and some parts of the system only pay attention to one of them
depending on whether they need the additional metadata.

My theory for these specific bugs is that we are getting into a race
condition w/r/t the ordering of these events when waiting for shutdown
to finish; the speculative fix is to ensure that -- for the purposes of
waiting for a session shutdown to complete -- a `onDidEndSession` is as
good as an `Exited` status.

It would be good to refactor the way exits are handled in the system so
that it is not necessary to check both events here, but that's a larger
change than I want to make speculatively.

Here's a smoke test run that shows the tests mentioned in the above two
issues are passing with this change:
https://github.com/posit-dev/positron/actions/runs/11807687400/job/32896351530
(not a guarantee that the change is correct since the failures are
intermittent)
@jonvanausdeln
Copy link
Contributor

We aren't seeing this in CI test now (after merge)

@jonvanausdeln
Copy link
Contributor

@jonvanausdeln jonvanausdeln reopened this Nov 16, 2024
@petetronic
Copy link
Collaborator

@seeM after we address #2671 we could work with QA to see if this issue subsides in CI?

@petetronic petetronic added this to the 2025.01.0 Pre-Release milestone Nov 18, 2024
@testlabauto
Copy link
Contributor Author

Note that this can happen with Python, too: https://github.com/posit-dev/positron/actions/runs/11906025901/job/33177443139

@seeM
Copy link
Contributor

seeM commented Nov 19, 2024

@petetronic Yes, I think this issue may be a duplicate of #4043 which I'm hoping to fix as part of the notebook session stability work.

@testlabauto
Copy link
Contributor Author

Verified Fixed

Positron Version(s) : multiple, but including 2025.01.0-10
OS Version          : ubuntu

Test scenario(s)

Have not seen this in automation in a while

Link(s) to TestRail test cases run or created:

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area: notebooks Issues related to Notebooks category. lang: r
Projects
None yet
Development

No branches or pull requests

4 participants