charm went `offline` and has network connection errors #341

orfeas-k · 2023-11-30T10:59:22Z

I deployed mysql-k8s to EKS from 8.0/edge on 21 of November as part of Charmed Kubeflow bundle and the charm went into Maintenance with message offline and stayed there for a while. Eventually, it went to active by itself again but I took a look at the logs and saw a bunch of "log-sender" manifold worker returned unexpected error errors there.

Before that happens, I had scaled down the cluster (during the night) and scaled it up again.

Steps to reproduce

Unfortunately, I haven't found a way to reproduce this.

Expected behavior

Stay active and being able to respond to requests.

Actual behavior

I think that as a result of the above, one of our charms fails to contact mysql-k8s in that cluster with the following error

Ping to Katib db failed: dial tcp 10.100.51.25:3306: connect: connection refused

Versions

Operating system: Ubuntu 22.04

Juju CLI: 3.1/stable

Juju agent: unknown

Charm revision: deployed 8.0/edge on 21st of November

EKS: 1.25

Log output

Logs are from after I scaled up the cluster.
juju debug-log.txt
k8s logs.txt

Additional context

A user of CKF reported similar logs with revision 99 on juju 2.9 and microk8s 1.24. They had disabled and enabled their microk8s too and I think these logs are after re-enabling microk8s.
db-logs.txt

P.S. Feel free to rename this issue, was not sure what the title should be.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-30T10:59:39Z

https://warthogs.atlassian.net/browse/DPE-3087

paulomach · 2023-11-30T19:06:00Z

Hi @orfeas-k , were you scaling back up from 0 units?

orfeas-k · 2023-12-01T08:34:22Z

yes @paulomach AFAICT. I scaled EKS cluster down to 0 nodes and then back to two.

paulomach · 2023-12-01T17:39:11Z

@orfeas-k that's probably it. We do have known issues when scaling from zero nodes, and a solution is under discussion and unfortunately will not come quickly.

orfeas-k added the bug Something isn't working label Nov 30, 2023

orfeas-k mentioned this issue Nov 30, 2023

Failed to Report logs error in experiment Pod on latest/edge canonical/katib-operators#108

Closed

paulomach self-assigned this Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

charm went `offline` and has network connection errors #341

charm went `offline` and has network connection errors #341

orfeas-k commented Nov 30, 2023

github-actions bot commented Nov 30, 2023

paulomach commented Nov 30, 2023

orfeas-k commented Dec 1, 2023

paulomach commented Dec 1, 2023 •

edited

Loading

charm went offline and has network connection errors #341

charm went offline and has network connection errors #341

Comments

orfeas-k commented Nov 30, 2023

Steps to reproduce

Expected behavior

Actual behavior

Versions

Log output

Additional context

github-actions bot commented Nov 30, 2023

paulomach commented Nov 30, 2023

orfeas-k commented Dec 1, 2023

paulomach commented Dec 1, 2023 • edited Loading

charm went `offline` and has network connection errors #341

charm went `offline` and has network connection errors #341

paulomach commented Dec 1, 2023 •

edited

Loading