Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflector#start() does not retry infinitely #13

Open
ljnelson opened this issue Oct 9, 2018 · 3 comments
Open

Reflector#start() does not retry infinitely #13

ljnelson opened this issue Oct 9, 2018 · 3 comments
Assignees
Labels

Comments

@ljnelson
Copy link
Member

ljnelson commented Oct 9, 2018

I am not entirely sure but I believe that the relevant Go code ends up retrying forever:

https://github.com/kubernetes/client-go/blob/dcf16a0f3b52098c3d4c1467b6c80c3e88ff65fb/tools/cache/reflector.go#L128-L137

But Reflector#start() will bomb out if it can't list things:

final KubernetesResourceList<? extends T> list = ((Listable<? extends KubernetesResourceList<? extends T>>)this.operation).list();

That line will throw a KubernetesClientException, and that's it. I think this whole method should (internally) be trying this list-and-watch loop forever.

For more details, see also the ListAndWatch function:

https://github.com/kubernetes/client-go/blob/dcf16a0f3b52098c3d4c1467b6c80c3e88ff65fb/tools/cache/reflector.go#L165-L275

Another way to put this is that currently Reflector#start() really just models ListAndWatch, but it should be modeling Run in reflector.go as well.

Complicating matters is the fact that the fabric8 client will automatically try to reconnect watches if they fail.

@ljnelson ljnelson self-assigned this Oct 9, 2018
@ljnelson ljnelson added the bug label Oct 9, 2018
@ljnelson
Copy link
Member Author

I believe this code:

https://github.com/kubernetes/client-go/blob/dcf16a0f3b52098c3d4c1467b6c80c3e88ff65fb/tools/cache/reflector.go#L226-L274

…is actually implemented by the fabric8 client, more or less:

https://github.com/fabric8io/kubernetes-client/blob/cee8550053e671bb0b3489e37196924f4c949a3f/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatchConnectionManager.java#L293-L333

…so the problem may simply reduce to the body of the start() method needing to run infinitely, i.e. to ensure that the listing behavior is also retried ad nauseam.

@ljnelson
Copy link
Member Author

Ah, wait; this is already handled:

@Override
public final void onClose(final KubernetesClientException exception) {
final String cn = this.getClass().getName();
final String mn = "onClose";
if (logger.isLoggable(Level.FINER)) {
logger.entering(cn, mn, exception);
}
synchronized (Reflector.this) {
// No need to close it; we're being called because it's
// closing.
Reflector.this.watch = null;
}
if (exception != null) {
if (logger.isLoggable(Level.WARNING)) {
logger.logp(Level.WARNING, cn, mn, exception.getMessage(), exception);
}
// See
// https://github.com/kubernetes/client-go/blob/5f85fe426e7aa3c1df401a7ae6c1ba837bd76be9/tools/cache/reflector.go#L204.
try {
Reflector.this.start();
} catch (final IOException ioException) {
if (logger.isLoggable(Level.SEVERE)) {
logger.logp(Level.SEVERE, cn, mn, ioException.getMessage(), ioException);
}
}

There could be better logging here. I'll add logging and then close this issue.

@ljnelson
Copy link
Member Author

Need to also review the Go code to see if the "full replace" operation occurs in the event of a failure-and-retry occurrence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant