Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefect Worker Robustness #20

Open
pgulley opened this issue Jul 12, 2024 · 2 comments
Open

Prefect Worker Robustness #20

pgulley opened this issue Jul 12, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@pgulley
Copy link
Member

pgulley commented Jul 12, 2024

The prefect worker on Guerin sometimes crashes for kind of opaque reasons- I'd love to get more insight into why this happens and figure out how to prevent the error, automatically restart it, or otherwise troubleshoot.

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/anyio/_core/_sockets.py", line 189, in connect_tcp
addr_obj = ip_address(remote_host)
File "/usr/lib/python3.10/ipaddress.py", line 54, in ip_address
raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'api.prefect.cloud' does not appear to be an IPv4 or IPv6 address

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 10, in map_exceptions
yield
File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/anyio.py", line 114, in connect_tcp
stream: anyio.abc.ByteStream = await anyio.connect_tcp(
File "/usr/local/lib/python3.10/dist-packages/anyio/_core/_sockets.py", line 192, in connect_tcp
gai_res = await getaddrinfo(
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/python3.10/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

This error occurs silently maybe once a week, never during sous-chef execution

@pgulley pgulley added the bug Something isn't working label Jul 12, 2024
@pgulley pgulley self-assigned this Jul 12, 2024
@pgulley
Copy link
Member Author

pgulley commented Jul 12, 2024

I've jumped into the prefect-community slack to see if they have any insight.

It's possible that the solution is just to set it up as a systemd service with restart enabled, but I'd love to have more insight before just accepting that situation

@pgulley
Copy link
Member Author

pgulley commented Jul 12, 2024

the systemd configuration process is documented here: https://discourse.prefect.io/t/how-to-run-a-prefect-2-worker-as-a-systemd-service-on-linux/1450

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

1 participant