You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 1, 2021. It is now read-only.
$ dask create foo cluster.yml
....
replicationcontroller "jupyter-notebook" created
replicationcontroller "dask-scheduler" created
replicationcontroller "dask-worker" created
INFO: Waiting for kubernetes... (^C to stop)
INFO: Services are up
INFO: Services are up
The connection to the server x.x.x.x was refused - did you specify the right host or port?
CRITICAL: Traceback (most recent call last):
File "/Users/bmabey/anaconda/envs/drugdiscovery/lib/python3.6/site-packages/dask_kubernetes-0.0.1-py3.6.egg/dask_kubernetes/cli/main.py", line 26, in start
Is there anything I can do to have the cluster continue to be setup after this? Eventually the dask info foo returned information but I was unable to connect to any of the services.
The text was updated successfully, but these errors were encountered:
Do you have any idea what is actually going on?
We can certainly put more try/excepts around trying to connect to the cluster, but I'm not sure how that will help when we don't understand the cause.
Could you possibly debug where in the code the exception is happening?
Any idea how the message "Services are up" can have appeared twice?
The double INFO: Services are up was a copy/paste error.
In general when this happens once the info command comes back then I am able to connect. The one time that I couldn't connect I had tried messing with the pods and so that is probably what broke it.
So I think what needs to happen is for this subprocess call to be retried with some back-offs and evntual timeouts:
subprocess.CalledProcessError: Command 'kubectl --output=json --context gke_foo_us-east1-b_cluster get services' returned non-zero exit status 1.
around the calls to get_pods and services_in_context calls within wait_until_ready (but not in the functions themselves - if they are called directly, they should raise I think).
Would you like to contribute this?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I have run into this a few times:
Is there anything I can do to have the cluster continue to be setup after this? Eventually the
dask info foo
returned information but I was unable to connect to any of the services.The text was updated successfully, but these errors were encountered: