Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

charm went offline and has network connection errors #341

Open
orfeas-k opened this issue Nov 30, 2023 · 4 comments
Open

charm went offline and has network connection errors #341

orfeas-k opened this issue Nov 30, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@orfeas-k
Copy link

I deployed mysql-k8s to EKS from 8.0/edge on 21 of November as part of Charmed Kubeflow bundle and the charm went into Maintenance with message offline and stayed there for a while. Eventually, it went to active by itself again but I took a look at the logs and saw a bunch of "log-sender" manifold worker returned unexpected error errors there.

Before that happens, I had scaled down the cluster (during the night) and scaled it up again.

Steps to reproduce

Unfortunately, I haven't found a way to reproduce this.

Expected behavior

Stay active and being able to respond to requests.

Actual behavior

I think that as a result of the above, one of our charms fails to contact mysql-k8s in that cluster with the following error

Ping to Katib db failed: dial tcp 10.100.51.25:3306: connect: connection refused

Versions

Operating system: Ubuntu 22.04

Juju CLI: 3.1/stable

Juju agent: unknown

Charm revision: deployed 8.0/edge on 21st of November

EKS: 1.25

Log output

Logs are from after I scaled up the cluster.
juju debug-log.txt
k8s logs.txt

Additional context

A user of CKF reported similar logs with revision 99 on juju 2.9 and microk8s 1.24. They had disabled and enabled their microk8s too and I think these logs are after re-enabling microk8s.
db-logs.txt

P.S. Feel free to rename this issue, was not sure what the title should be.

@orfeas-k orfeas-k added the bug Something isn't working label Nov 30, 2023
Copy link
Contributor

@paulomach
Copy link
Contributor

Hi @orfeas-k , were you scaling back up from 0 units?

@orfeas-k
Copy link
Author

orfeas-k commented Dec 1, 2023

yes @paulomach AFAICT. I scaled EKS cluster down to 0 nodes and then back to two.

@paulomach
Copy link
Contributor

paulomach commented Dec 1, 2023

@orfeas-k that's probably it. We do have known issues when scaling from zero nodes, and a solution is under discussion and unfortunately will not come quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants