Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]PolarDB-X member reconfiguration support #4

Open
ahjing99 opened this issue Nov 3, 2023 · 2 comments
Open

[Feature]PolarDB-X member reconfiguration support #4

ahjing99 opened this issue Nov 3, 2023 · 2 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@ahjing99
Copy link
Contributor

ahjing99 commented Nov 3, 2023

➜ ~ kbcli version
Kubernetes: v1.27.3-gke.100
KubeBlocks: 0.7.0-beta.18
kbcli: 0.7.0-beta.18

  1. Create PolarDB-X

      `helm repo add kubeblocks-kbcli  https://jihulab.com/api/v4/projects/150246/packages/helm/stable`

"kubeblocks-kbcli" already exists with the same configuration, skipping

      `helm repo update kubeblocks-kbcli `

Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "kubeblocks-kbcli" chart repository
Update Complete. ⎈Happy Helming!⎈

      `helm upgrade --install polardbx kubeblocks-kbcli/polardbx --version 0.7.0-beta.18 `

Release "polardbx" has been upgraded. Happy Helming!
NAME: polardbx
LAST DEPLOYED: Fri Nov  3 11:57:54 2023
NAMESPACE: default
STATUS: deployed
REVISION: 4
TEST SUITE: None
NOTES:
Thanks for installing PolarDB-X using KubeBlocks!


    `kbcli cluster create  polardbx-tjxuol             --termination-policy=Halt             --monitoring-interval=0 --enable-all-logs=false --cluster-definition=polardbx --cluster-version=polardbx-v1.4.1 --set cpu=500m,memory=1Gi,replicas=3,storage=5Gi  --namespace default `

Cluster polardbx-tjxuol created

➜  ~ kbcli cluster describe polardbx-tjxuol
Name: polardbx-tjxuol	 Created Time: Nov 03,2023 11:58 UTC+0800
NAMESPACE   CLUSTER-DEFINITION   VERSION           STATUS    TERMINATION-POLICY
default     polardbx             polardbx-v1.4.1   Running   WipeOut

Endpoints:
COMPONENT   MODE        INTERNAL                                             EXTERNAL
gms         ReadWrite   polardbx-tjxuol-gms.default.svc.cluster.local:3306   <none>
                        polardbx-tjxuol-gms.default.svc.cluster.local:9104
dn          ReadWrite   polardbx-tjxuol-dn.default.svc.cluster.local:3306    <none>
cn          ReadWrite   polardbx-tjxuol-cn.default.svc.cluster.local:3306    <none>
                        polardbx-tjxuol-cn.default.svc.cluster.local:9104
cdc         ReadWrite   polardbx-tjxuol-cdc.default.svc.cluster.local:3306   <none>
                        polardbx-tjxuol-cdc.default.svc.cluster.local:9104

Topology:
COMPONENT   INSTANCE                ROLE       STATUS    AZ              NODE                                                CREATED-TIME
cdc         polardbx-tjxuol-cdc-0   <none>     Running   us-central1-c   gke-yijing-default-pool-3e14ea35-klwc/10.128.0.26   Nov 03,2023 11:58 UTC+0800
cn          polardbx-tjxuol-cn-0    <none>     Running   us-central1-c   gke-yijing-default-pool-3e14ea35-klwc/10.128.0.26   Nov 03,2023 11:58 UTC+0800
dn          polardbx-tjxuol-dn-0    follower   Running   us-central1-c   gke-yijing-default-pool-3e14ea35-hqtr/10.128.0.30   Nov 03,2023 11:58 UTC+0800
dn          polardbx-tjxuol-dn-1    leader     Running   us-central1-c   gke-yijing-default-pool-3e14ea35-hxpl/10.128.0.28   Nov 03,2023 11:58 UTC+0800
dn          polardbx-tjxuol-dn-2    follower   Running   us-central1-c   gke-yijing-default-pool-3e14ea35-klwc/10.128.0.26   Nov 03,2023 11:58 UTC+0800
gms         polardbx-tjxuol-gms-0   leader     Running   us-central1-c   gke-yijing-default-pool-3e14ea35-wg54/10.128.0.35   Nov 03,2023 11:58 UTC+0800
gms         polardbx-tjxuol-gms-1   follower   Running   us-central1-c   gke-yijing-default-pool-3e14ea35-wg54/10.128.0.35   Nov 03,2023 11:58 UTC+0800
gms         polardbx-tjxuol-gms-2   follower   Running   us-central1-c   gke-yijing-default-pool-3e14ea35-klwc/10.128.0.26   Nov 03,2023 11:58 UTC+0800

Resources Allocation:
COMPONENT   DEDICATED   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE-SIZE   STORAGE-CLASS
gms         false       500m / 500m          1Gi / 1Gi               data:5Gi       kb-default-sc
dn          false       1 / 1                1Gi / 1Gi               data:20Gi      kb-default-sc
cn          false       1 / 1                1Gi / 1Gi               data:20Gi      kb-default-sc
cdc         false       1 / 1                1Gi / 1Gi               data:20Gi      kb-default-sc

Images:
COMPONENT   TYPE   IMAGE
gms         gms    polardbx/polardbx-engine-2.0:latest
dn          dn     polardbx/polardbx-engine-2.0:latest
cn          cn     polardbx/polardbx-sql:latest
cdc         cdc    polardbx/polardbx-cdc:latest

Show cluster events: kbcli cluster list-events -n default polardbx-tjxuol
  1. Restart
➜  ~ kbcli cluster restart polardbx-tjxuol
Please type the name again(separate with white space when more than one): polardbx-tjxuol
OpsRequest polardbx-tjxuol-restart-tqb2c created successfully, you can view the progress:
	kbcli cluster describe-ops polardbx-tjxuol-restart-tqb2c -n default

➜  ~ kbcli cluster describe-ops polardbx-tjxuol-restart-tqb2c -n default
Spec:
  Name: polardbx-tjxuol-restart-tqb2c	NameSpace: default	Cluster: polardbx-tjxuol	Type: Restart

Command:
  kbcli cluster restart polardbx-tjxuol --components=gms,dn,cn,cdc --namespace=default

Status:
  Start Time:         Nov 03,2023 12:10 UTC+0800
  Duration:           28m
  Status:             Running
  Progress:           2/8
                      OBJECT-KEY                  STATUS       DURATION    MESSAGE
                      Pod/polardbx-tjxuol-cdc-0   Succeed      3m21s       Successfully restart: Pod/polardbx-tjxuol-cdc-0 in Component: cdc
                      Pod/polardbx-tjxuol-cn-0    Succeed      3m4s        Successfully restart: Pod/polardbx-tjxuol-cn-0 in Component: cn
                      Pod/polardbx-tjxuol-dn-1    Pending      <Unknown>
                      Pod/polardbx-tjxuol-dn-2    Pending      <Unknown>
                      Pod/polardbx-tjxuol-dn-0    Processing   28m         Start to restart: Pod/polardbx-tjxuol-dn-0 in Component: dn
                      Pod/polardbx-tjxuol-gms-0   Pending      <Unknown>
                      Pod/polardbx-tjxuol-gms-2   Pending      <Unknown>
                      Pod/polardbx-tjxuol-gms-1   Processing   28m         Start to restart: Pod/polardbx-tjxuol-gms-1 in Component: gms

Conditions:
LAST-TRANSITION-TIME         TYPE          REASON                         STATUS   MESSAGE
Nov 03,2023 12:10 UTC+0800   Progressing   OpsRequestProgressingStarted   True     Start to process the OpsRequest: polardbx-tjxuol-restart-tqb2c in Cluster: polardbx-tjxuol
Nov 03,2023 12:10 UTC+0800   Validated     ValidateOpsRequestPassed       True     OpsRequest: polardbx-tjxuol-restart-tqb2c is validated
Nov 03,2023 12:10 UTC+0800   Restarting    RestartStarted                 True     Start to restart database in Cluster: polardbx-tjxuol

Warning Events: <none>

➜  ~ k describe sts polardbx-tjxuol-dn
Name:               polardbx-tjxuol-dn
Namespace:          default
CreationTimestamp:  Fri, 03 Nov 2023 11:58:23 +0800
Selector:           app.kubernetes.io/instance=polardbx-tjxuol,app.kubernetes.io/managed-by=kubeblocks,app.kubernetes.io/name=polardbx,apps.kubeblocks.io/component-name=dn
Labels:             app.kubernetes.io/component=dn
                    app.kubernetes.io/instance=polardbx-tjxuol
                    app.kubernetes.io/managed-by=kubeblocks
                    app.kubernetes.io/name=polardbx
                    apps.kubeblocks.io/component-name=dn
                    rsm.workloads.kubeblocks.io/controller-generation=2
Annotations:        config.kubeblocks.io/tpl-polardbx-scripts: polardbx-tjxuol-dn-polardbx-scripts
                    kubeblocks.io/generation: 1
Replicas:           3 desired | 3 total
Update Strategy:    OnDelete
Pods Status:        3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/component=dn
                    app.kubernetes.io/instance=polardbx-tjxuol
                    app.kubernetes.io/managed-by=kubeblocks
                    app.kubernetes.io/name=polardbx
                    app.kubernetes.io/version=polardbx-v1.4.1
                    apps.kubeblocks.io/component-name=dn
                    apps.kubeblocks.io/workload-type=Consensus
  Annotations:      kubeblocks.io/restart: 2023-11-03T04:10:57Z
  Service Account:  kb-polardbx-tjxuol
  Init Containers:
   tools-updater:
    Image:      polardbx/xstore-tools:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/ash
    Args:
      -c
      ./hack/update.sh /target
    Limits:
      cpu:     0
      memory:  0
    Environment Variables from:
      polardbx-tjxuol-dn-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:                (v1:metadata.name)
      KB_POD_UID:                 (v1:metadata.uid)
      KB_NAMESPACE:               (v1:metadata.namespace)
      KB_SA_NAME:                 (v1:spec.serviceAccountName)
      KB_NODENAME:                (v1:spec.nodeName)
      KB_HOST_IP:                 (v1:status.hostIP)
      KB_POD_IP:                  (v1:status.podIP)
      KB_POD_IPS:                 (v1:status.podIPs)
      KB_HOSTIP:                  (v1:status.hostIP)
      KB_PODIP:                   (v1:status.podIP)
      KB_PODIPS:                  (v1:status.podIPs)
      KB_CLUSTER_NAME:           polardbx-tjxuol
      KB_COMP_NAME:              dn
      KB_CLUSTER_COMP_NAME:      polardbx-tjxuol-dn
      KB_CLUSTER_UID_POSTFIX_8:  690c6c10
      KB_POD_FQDN:               $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
      NODE_NAME:                  (v1:spec.nodeName)
    Mounts:
      /target from xstore-tools (rw)
   role-agent-installer:
    Image:      msoap/shell2http:1.16.0
    Port:       <none>
    Host Port:  <none>
    Command:
      cp
      /app/shell2http
      /role-probe/agent
    Environment:  <none>
    Mounts:
      /role-probe from role-agent (rw)
  Containers:
   engine:
    Image:       polardbx/polardbx-engine-2.0:latest
    Ports:       3306/TCP, 11306/TCP, 31600/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      /scripts/xstore-setup.sh
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     1
      memory:  1Gi
    Startup:   tcp-socket :mysql delay=20s timeout=30s period=10s #success=1 #failure=60
    Environment Variables from:
      polardbx-tjxuol-dn-env      ConfigMap  Optional: false
      polardbx-tjxuol-dn-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:                (v1:metadata.name)
      KB_POD_UID:                 (v1:metadata.uid)
      KB_NAMESPACE:               (v1:metadata.namespace)
      KB_SA_NAME:                 (v1:spec.serviceAccountName)
      KB_NODENAME:                (v1:spec.nodeName)
      KB_HOST_IP:                 (v1:status.hostIP)
      KB_POD_IP:                  (v1:status.podIP)
      KB_POD_IPS:                 (v1:status.podIPs)
      KB_HOSTIP:                  (v1:status.hostIP)
      KB_PODIP:                   (v1:status.podIP)
      KB_PODIPS:                  (v1:status.podIPs)
      KB_CLUSTER_NAME:           polardbx-tjxuol
      KB_COMP_NAME:              dn
      KB_CLUSTER_COMP_NAME:      polardbx-tjxuol-dn
      KB_CLUSTER_UID_POSTFIX_8:  690c6c10
      KB_POD_FQDN:               $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
      LANG:                      en_US.utf8
      LC_ALL:                    en_US.utf8
      ENGINE:                    galaxy
      ENGINE_HOME:               /opt/galaxy_engine
      NODE_ROLE:                 candidate
      NODE_IP:                    (v1:status.hostIP)
      NODE_NAME:                  (v1:spec.nodeName)
      POD_IP:                     (v1:status.podIP)
      POD_NAME:                   (v1:metadata.name)
      LIMITS_CPU:                1000 (limits.cpu)
      LIMITS_MEM:                1073741824 (limits.memory)
      PORT_MYSQL:                3306
      PORT_PAXOS:                11306
      PORT_POLARX:               31600
      KB_SERVICE_USER:           polardbx_root
      KB_SERVICE_PASSWORD:       <set to the key 'password' in secret 'polardbx-tjxuol-conn-credential'>  Optional: false
      RSM_COMPATIBILITY_MODE:    true
    Mounts:
      /data-log/mysql from data-log (rw)
      /data/mysql from data (rw)
      /etc/podinfo from podinfo (rw)
      /scripts/xstore-post-start.sh from scripts (rw,path="xstore-post-start.sh")
      /scripts/xstore-setup.sh from scripts (rw,path="xstore-setup.sh")
      /tools/xstore from xstore-tools (rw)
   exporter:
    Image:      prom/mysqld-exporter:v0.14.0
    Port:       9104/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     0
      memory:  0
    Environment Variables from:
      polardbx-tjxuol-dn-env      ConfigMap  Optional: false
      polardbx-tjxuol-dn-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:                (v1:metadata.name)
      KB_POD_UID:                 (v1:metadata.uid)
      KB_NAMESPACE:               (v1:metadata.namespace)
      KB_SA_NAME:                 (v1:spec.serviceAccountName)
      KB_NODENAME:                (v1:spec.nodeName)
      KB_HOST_IP:                 (v1:status.hostIP)
      KB_POD_IP:                  (v1:status.podIP)
      KB_POD_IPS:                 (v1:status.podIPs)
      KB_HOSTIP:                  (v1:status.hostIP)
      KB_PODIP:                   (v1:status.podIP)
      KB_PODIPS:                  (v1:status.podIPs)
      KB_CLUSTER_NAME:           polardbx-tjxuol
      KB_COMP_NAME:              dn
      KB_CLUSTER_COMP_NAME:      polardbx-tjxuol-dn
      KB_CLUSTER_UID_POSTFIX_8:  690c6c10
      KB_POD_FQDN:               $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
      MYSQL_MONITOR_USER:        <set to the key 'username' in secret 'polardbx-tjxuol-conn-credential'>  Optional: false
      MYSQL_MONITOR_PASSWORD:    <set to the key 'password' in secret 'polardbx-tjxuol-conn-credential'>  Optional: false
      DATA_SOURCE_NAME:          $(MYSQL_MONITOR_USER):$(MYSQL_MONITOR_PASSWORD)@(localhost:3306)/
    Mounts:                      <none>
   kb-role-probe:
    Image:       registry.cn-hangzhou.aliyuncs.com/apecloud/kubeblocks-tools:0.7.0-beta.18
    Ports:       7373/TCP, 50101/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      lorry
      --port
      7373
      --grpcport
      50101
    Readiness:  exec [/bin/grpc_health_probe -addr=:50101] delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:
      KB_RSM_USERNAME:               <set to the key 'username' in secret 'polardbx-tjxuol-conn-credential'>  Optional: false
      KB_RSM_PASSWORD:               <set to the key 'password' in secret 'polardbx-tjxuol-conn-credential'>  Optional: false
      KB_RSM_ACTION_SVC_LIST:        [36501]
      KB_SERVICE_USER:               <set to the key 'username' in secret 'polardbx-tjxuol-conn-credential'>  Optional: false
      KB_SERVICE_PASSWORD:           <set to the key 'password' in secret 'polardbx-tjxuol-conn-credential'>  Optional: false
      KB_RSM_SERVICE_PORT:           3306
      KB_SERVICE_PORT:               3306
      KB_RSM_ROLE_UPDATE_MECHANISM:  DirectAPIServerEventUpdate
      KB_RSM_ROLE_PROBE_TIMEOUT:     1
      KB_POD_NAME:                    (v1:metadata.name)
      KB_NAMESPACE:                   (v1:metadata.namespace)
      KB_POD_UID:                     (v1:metadata.uid)
      KB_NODENAME:                    (v1:spec.nodeName)
      KB_SERVICE_CHARACTER_TYPE:     custom
    Mounts:                          <none>
   action-0:
    Image:      arey/mysql-client:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      /role-probe/agent
      -port
      36501
      -export-all-vars
      -form
      /role
      mysql -h127.0.0.1 -P3306 -uroot -N -B -e "select role from information_schema.alisql_cluster_local" | xargs echo -n
    Environment:
      KB_RSM_USERNAME:  <set to the key 'username' in secret 'polardbx-tjxuol-conn-credential'>  Optional: false
      KB_RSM_PASSWORD:  <set to the key 'password' in secret 'polardbx-tjxuol-conn-credential'>  Optional: false
    Mounts:
      /role-probe from role-agent (rw)
  Volumes:
   xstore-tools:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
      metadata.annotations -> annotations
      metadata.annotations['runmode'] -> runmode
      metadata.name -> name
      metadata.namespace -> namespace
   scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      polardbx-tjxuol-dn-polardbx-scripts
    Optional:  false
   data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   data-log:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   role-agent:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
Volume Claims:
  Name:          data
  StorageClass:  kb-default-sc
  Labels:        apps.kubeblocks.io/vct-name=data
  Annotations:   <none>
  Capacity:      20Gi
  Access Modes:  [ReadWriteOnce]
Events:
  Type     Reason               Age                From                    Message
  ----     ------               ----               ----                    -------
  Normal   SuccessfulCreate     41m                statefulset-controller  create Claim data-polardbx-tjxuol-dn-0 Pod polardbx-tjxuol-dn-0 in StatefulSet polardbx-tjxuol-dn success
  Normal   SuccessfulCreate     41m                statefulset-controller  create Claim data-polardbx-tjxuol-dn-1 Pod polardbx-tjxuol-dn-1 in StatefulSet polardbx-tjxuol-dn success
  Normal   SuccessfulCreate     41m                statefulset-controller  create Pod polardbx-tjxuol-dn-1 in StatefulSet polardbx-tjxuol-dn successful
  Normal   SuccessfulCreate     41m                statefulset-controller  create Claim data-polardbx-tjxuol-dn-2 Pod polardbx-tjxuol-dn-2 in StatefulSet polardbx-tjxuol-dn success
  Normal   SuccessfulCreate     41m                statefulset-controller  create Pod polardbx-tjxuol-dn-2 in StatefulSet polardbx-tjxuol-dn successful
  Normal   SuccessfulCreate     28m (x2 over 41m)  statefulset-controller  create Pod polardbx-tjxuol-dn-0 in StatefulSet polardbx-tjxuol-dn successful
  Warning  RecreatingFailedPod  28m (x8 over 28m)  statefulset-controller  StatefulSet default/polardbx-tjxuol-dn is recreating failed Pod polardbx-tjxuol-dn-0
  Normal   SuccessfulDelete     28m (x8 over 28m)  statefulset-controller  delete Pod polardbx-tjxuol-dn-0 in StatefulSet polardbx-tjxuol-dn successful
➜  ~
@apecloud-bot apecloud-bot added the bug Something isn't working label Nov 3, 2023
@free6om
Copy link
Contributor

free6om commented Nov 3, 2023

seems something went wrong in the DB container:

2023-11-03 04:12:17,738 - GalaxyEngine - INFO - () start command: /opt/galaxy_engine/bin/mysqld_safe --defaults-file=/data/mysql/conf/my.cnf --loose-pod-name=polardbx-tjxuol-gms-1
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
2023-11-03T04:12:23.302469Z mysqld_safe Logging to '/data/mysql/log/alert.log'.
2023-11-03T04:12:23.414770Z mysqld_safe Starting mysqld daemon with databases from /data/mysql/data
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready

@free6om
Copy link
Contributor

free6om commented Nov 3, 2023

polardbx gms&dn component depend on immutable IP address currently, which means the pods can't be rescheduled yet.
need member reconfiguration configured to support restart, will add it in 0.8 or later.

@free6om free6om changed the title [BUG]PolarDB-X restart dn/gms always processing [Feature]PolarDB-X member reconfiguration support Nov 3, 2023
@free6om free6om transferred this issue from apecloud/kubeblocks Nov 8, 2023
@free6om free6om removed the bug Something isn't working label Nov 8, 2023
@free6om free6om added the good first issue Good for newcomers label Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants