Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly time mephisto-task out on "Initializing...." #829

Open
JackUrb opened this issue Jul 15, 2022 · 1 comment
Open

Properly time mephisto-task out on "Initializing...." #829

JackUrb opened this issue Jul 15, 2022 · 1 comment
Labels
bug Something isn't working proposal

Comments

@JackUrb
Copy link
Contributor

JackUrb commented Jul 15, 2022

There are many reports of workers connecting to a task only to be stuck on "Initializing" forever. This occurs when the Mephisto routing server is set up, but the backend is not connected.

We should provide better error messaging to the workers when this occurs, and ideally get a message to the task owner as it indicates improper shutdown or a connectivity issue.

@JackUrb JackUrb added the enhancement New feature or request label Jul 15, 2022
@pringshia
Copy link
Contributor

pringshia commented Aug 1, 2022

Some more context:

On the initial handshake, when getting agentId, the handshake goes to the router, but doesn't reach the router, we hang forever for a response.

Two error cases:

  • Router is aware that the server is down (no hearbeat after X mins), all subsequent messages result in 500x-esque error
  • If there actually was a timeout issue but the server and router are still up (either the front-end sends another request or user refreshes)

To repro:

  • Start up a server, kill Mephisto with a pipe (Ctrl+) and not with a Ctrl+C (this skips all the shutdown processes but the router is still running)

@meta-paul meta-paul added bug Something isn't working proposal and removed enhancement New feature or request labels Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working proposal
Projects
None yet
Development

No branches or pull requests

3 participants