Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: language server windows issues #4821

Merged
merged 6 commits into from
Aug 7, 2024

Conversation

mhuisi
Copy link
Contributor

@mhuisi mhuisi commented Jul 24, 2024

This PR resolves two language server bugs that especially affect Windows users:

  1. Editing the header could result in the watchdog not correctly restarting the file worker (server hangs when changing imports #3786, server out of sync with file when editing imports on Windows #3787), which would lead to the file seemingly being processed forever.
  • The cause of this issue was a race condition in the watchdog that was accidentally introduced as far back as refactor: revise server architecture #1884: In specific circumstances, the watchdog will attempt forwarding a message to the file worker after the process has exited due to a changed header, but before the file worker exiting has been noticed by the watchdog (which will then restart the file worker). In this case, the watchdog would mark the file worker as having crashed and not look at its exit code to restart the file worker, but instead treat it like a crashed file worker that will only be restarted when editing the file again. Not inspecting the exit code of the file worker when it crashed from forwarding a message from the file worker is necessary since we do not restart the file worker until another notification from the client arrives, and so we would read the same crash exit code over and over again in the main loop of the watchdog if we did not remove it from our list of file workers that we listen to.
  • This PR resolves this issue by distinguishing between "crashes when forwarding messages to the file worker" and "crashes when forwarding messages from the file worker". In the former case, we still inspect the exit code of the file worker and potentially restart it if the imports changed, whereas in the latter case, we stop inspecting the exit code of the file worker. This is correct because the latter case is exactly the one where we need to stop inspecting the exit code but where a crash cannot occur as a result of a changed header, whereas the former case is exactly the one where we still need to inspect the exit code after a crash to ensure that we restart the file worker in case it exited because the header changed.
  • At some point in the future, it would be nice to revamp the concurrency model of the watchdog entirely now that we have all those fancy concurrency primitives that were not available four years ago when the watchdog was first written.
  1. On an especially slow Windows machine, we found that starting the language server would sometimes not succeed at all because reading from the stdin pipe in the watchdog produced an EINVAL error, which was in turn caused by an NT "pipe empty" error.
  • After lots of debugging, @Kha found that Lake accidentally passes its stdin to Git because it does not explicitly set the stdin field to null when spawning the process.
  • Changing this fixes the issue, which suggests that Git may mutate the pipe we pass to it to be non-blocking, which then causes a "pipe empty" error in the watchdog when we also attempt to read from that same pipe.
  • I'm still very uncertain why we only saw this issue on one particularly slow machine and not across the whole eco system.

This PR also resolves an issue where we would not correctly emit messages that we received while the file worker is being restarted to the corresponding file worker after the restart.

Closes #3786, closes #3787.

@mhuisi mhuisi requested a review from tydeu as a code owner July 24, 2024 14:11
src/Lean/Server/Watchdog.lean Outdated Show resolved Hide resolved
src/Lean/Server/Watchdog.lean Outdated Show resolved Hide resolved
@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Jul 24, 2024
@leanprover-community-mathlib4-bot
Copy link
Collaborator

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase af40e618111581c82fc44de922368a02208b499f --onto 5938dbbd14145c9b78f7645c4777f62a3fc8212c. (2024-07-24 14:27:44)

mhuisi and others added 2 commits July 24, 2024 16:28
Co-authored-by: Sebastian Ullrich <sebasti@nullri.ch>
Co-authored-by: Sebastian Ullrich <sebasti@nullri.ch>
Copy link
Member

@tydeu tydeu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I would not have thought that stdin would matter to Lake's Git processes.

mhuisi added 2 commits July 24, 2024 18:05
In the version before this commit, if the client task signals a crash and then the worker task signals a crash, queued messages between the first crash and the second crash would be lost. Now, we retain them.
@mhuisi
Copy link
Contributor Author

mhuisi commented Jul 25, 2024

(We've decided that this PR will wait until the start of the next release cycle so that we can test it ourselves for a while)

@Kha Kha added this pull request to the merge queue Aug 7, 2024
Merged via the queue into leanprover:master with commit 574066b Aug 7, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server out of sync with file when editing imports on Windows server hangs when changing imports
4 participants