Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restarted secondary fails to join the cluster back #415

Open
gboutry opened this issue May 16, 2024 · 1 comment
Open

restarted secondary fails to join the cluster back #415

gboutry opened this issue May 16, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@gboutry
Copy link

gboutry commented May 16, 2024

Steps to reproduce

  1. rollout restart pods in a 3 units cluster (it's not 100% reproducible, but happen often enough)

Expected behavior

Secondary should join the cluster back

Actual behavior

Secondary is not joining back, considered as offline

Versions

Operating system:

Juju CLI:

Juju agent:

Charm revision: 127

microk8s: MicroK8s v1.28.7 revision 6532

Log output

2024-05-16T13:45:18.111Z [container-agent] 2024-05-16 13:45:18 INFO juju-log Unit workload member-state is offline with member-role unknown
2024-05-16T13:45:21.896Z [container-agent] 2024-05-16 13:45:21 ERROR juju-log Failed to get cluster status for cluster-ab0e762c137dc447d08ce68b19fb20b3
2024-05-16T13:45:21.903Z [container-agent] 2024-05-16 13:45:21 ERROR juju-log Failed to get cluster endpoints
2024-05-16T13:45:21.903Z [container-agent] Traceback (most recent call last):
2024-05-16T13:45:21.903Z [container-agent]   File "/var/lib/juju/agents/unit-heat-mysql-0/charm/src/mysql_k8s_helpers.py", line 836, in update_endpoints
2024-05-16T13:45:21.903Z [container-agent]     rw_endpoints, ro_endpoints, offline = self.get_cluster_endpoints(get_ips=False)
2024-05-16T13:45:21.903Z [container-agent]   File "/var/lib/juju/agents/unit-heat-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1469, in get_cluster_endpoints
2024-05-16T13:45:21.903Z [container-agent]     raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster status")
2024-05-16T13:45:21.903Z [container-agent] charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster status
2024-05-16T13:45:22.191Z [container-agent] 2024-05-16 13:45:22 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)
2024-05-16T13:47:53.387910Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 3306'
2024-05-16T13:48:00.275796Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error connecting to all peers. Member join failed. Local port: 3306'
2024-05-16T13:48:00.385285Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 3306'
2024-05-16T13:48:07.654156Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error connecting to all peers. Member join failed. Local port: 3306'
2024-05-16T13:48:07.767533Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 3306'
2024-05-16T13:48:08.469058Z 28247 [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2024-05-16T13:48:08.469343Z 28247 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Additional context

After a debugging session with @paulomach, we got the instance to successfully join back using: c.rejoin_instance("heat-mysql-0.heat-mysql-endpoints.openstack.svc.cluster.local:3306")

The command was performed from the failed unit to the primary unit (ruling out connection issue)

@gboutry gboutry added the bug Something isn't working label May 16, 2024
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant