Percona mysql xtradb群集无法正常启动,节点重启无法正常工作

时间:2017-03-26 09:19:41

标签: mysql percona galera


当启动3个kubernetes pod的新的percona群集时,grastate.dat seq_no设置为-1并且不会更改。在删除一个pod并观察它重新启动时,期望它重新加入群集,它将其初始位置设置为00000000-0000-0000-0000-000000000000:-1并尝试连接到自身(它以前的ip),可能是因为它& #39; d是群集中的第一个pod?然后它超时了它与自身的错误联系:

2017-03-26T08:38:05.374058Z 0 [Note] WSREP: (b7571ff8, 'tcp://') connection to peer 00000000 with addr tcp:// timed out, no messages seen in PT3S




root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/grastate.dat
# GALERA saved state
version: 2.1
uuid:    a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno:   -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/grastate.dat
# GALERA saved state
version: 2.1
uuid:    a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno:   -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/grastate.dat
# GALERA saved state
version: 2.1
uuid:    a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno:   -1
safe_to_bootstrap: 0

此时我可以执行mysql -h percona -u wordpress -p并连接和wordpress也可以。

方案: 我有3个percona pods

/ # jonathan@ubuntu:~/Projects/k8wp$ kubectl get pods
NAME                         READY     STATUS    RESTARTS   AGE
etcd-0                       1/1       Running   1          12h
etcd-1                       1/1       Running   0          12h
etcd-2                       1/1       Running   3          12h
etcd-3                       1/1       Running   1          12h
percona-0                    1/1       Running   0          8m
percona-1                    1/1       Running   0          57m
percona-2                    1/1       Running   0          57m


root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/gvwstate.dat
my_uuid: b7571ff8-11f8-11e7-bd2d-8b50487e1523
view_id: 3 b7571ff8-11f8-11e7-bd2d-8b50487e1523 3
bootstrap: 0
member: b7571ff8-11f8-11e7-bd2d-8b50487e1523 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0


root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/gvwstate.dat
my_uuid: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a
view_id: 3 bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 4
bootstrap: 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/gvwstate.dat
my_uuid: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a
view_id: 3 bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 4
bootstrap: 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0


2017-03-26T08:37:58.370605Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-03-26T08:37:58.372537Z 0 [Note] WSREP: gcomm: connecting to group 'wordpress-001', peer ''
2017-03-26T08:38:01.373345Z 0 [Note] WSREP: (b7571ff8, 'tcp://') connection to peer 00000000 with addr tcp:// timed out, no messages seen in PT3S
2017-03-26T08:38:01.373682Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-03-26T08:38:01.373750Z 0 [Note] WSREP: view(view_id(NON_PRIM,b7571ff8,5) memb {
} joined {
} left {
} partitioned {
2017-03-26T08:38:01.373838Z 0 [Note] WSREP: gcomm: connected
2017-03-26T08:38:01.373872Z 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-03-26T08:38:01.373987Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-03-26T08:38:01.374012Z 0 [Note] WSREP: Opened channel 'wordpress-001'
2017-03-26T08:38:01.374108Z 0 [Note] WSREP: Waiting for SST to complete.
2017-03-26T08:38:01.374417Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-03-26T08:38:01.374469Z 0 [Note] WSREP: Flow-control interval: [16, 16]
2017-03-26T08:38:01.374491Z 0 [Note] WSREP: Received NON-PRIMARY.
2017-03-26T08:38:01.374560Z 1 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1

尝试连接到10.52.0.26中的2017-03-26T08:37:58.372537Z 0 [Note] WSREP: gcomm: connecting to group 'wordpress-001', peer ''的ip实际上是pod之前的ip,这里是删除percona-0之前我在etcd中的键列表

/ # etcdctl ls --recursive

kubectl删除pods / percona-0后:

/ # etcdctl ls --recursive





2017-03-26T08:37:44.069583Z 0 [Note] WSREP: (bd05a643, 'tcp://') turning message relay requesting on, nonlive peers: tcp:// 
2017-03-26T08:37:45.069756Z 0 [Note] WSREP: (bd05a643, 'tcp://') reconnecting to b7571ff8 (tcp://, attempt 0
2017-03-26T08:37:48.570332Z 0 [Note] WSREP: (bd05a643, 'tcp://') connection to peer 00000000 with addr tcp:// timed out, no messages seen in PT3S
2017-03-26T08:37:49.605089Z 0 [Note] WSREP: evs::proto(bd05a643, GATHER, view_id(REG,b7571ff8,3)) suspecting node: b7571ff8
2017-03-26T08:37:49.605276Z 0 [Note] WSREP: evs::proto(bd05a643, GATHER, view_id(REG,b7571ff8,3)) suspected node without join message, declaring inactive
2017-03-26T08:37:50.104676Z 0 [Note] WSREP: declaring c33d6a73 at tcp:// stable

新信息: 我再次重新启动了percona-0,这次它以某种方式出现了!经过几次尝试后,我意识到pod需要重新启动两次才会出现,即第一次删除它后,它会在上次错误后再次出现错误,第二次出现就好并与其他成员同步。这可能是因为它是集群中的第一个pod吗?



也; 如果我的节点崩溃,那么立刻取下所有的吊舱,这就是吊舱根本不会恢复的情况!我怀疑它是因为没有状态保存到grastate.dat,即seq_no保持-1,即使全局id可能改变,pods以mysqld shutdown退出,并出现以下错误:

jonathan@ubuntu:~/Projects/k8wp$ kubectl logs percona-2 | grep ERROR
2017-03-26T11:20:25.795085Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:25.795276Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:25.795544Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://': -110 (Connection timed out)
2017-03-26T11:20:25.795618Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:25.795645Z 0 [ERROR] WSREP: wsrep::connect(gcomm:// failed: 7
2017-03-26T11:20:25.795693Z 0 [ERROR] Aborting
jonathan@ubuntu:~/Projects/k8wp$ kubectl logs percona-1 | grep ERROR
2017-03-26T11:20:27.093780Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:27.093977Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:27.094145Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://': -110 (Connection timed out)
2017-03-26T11:20:27.094200Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:27.094227Z 0 [ERROR] WSREP: wsrep::connect(gcomm:// failed: 7
2017-03-26T11:20:27.094247Z 0 [ERROR] Aborting
jonathan@ubuntu:~/Projects/k8wp$ kubectl logs percona-0 | grep ERROR
2017-03-26T11:20:52.040214Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:52.040279Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:52.040385Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://': -110 (Connection timed out)
2017-03-26T11:20:52.040437Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:52.040471Z 0 [ERROR] WSREP: wsrep::connect(gcomm:// failed: 7
2017-03-26T11:20:52.040508Z 0 [ERROR] Aborting


root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/grastate.dat
# GALERA saved state
version: 2.1
uuid:    a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno:   -1
safe_to_bootstrap: 0
 root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/grastate.dat
# GALERA saved state
version: 2.1
uuid:    a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno:   -1
safe_to_bootstrap: 0
 root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/grastate.dat
# GALERA saved state
version: 2.1
uuid:    a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno:   -1
safe_to_bootstrap: 0


1 个答案:

答案 0 :(得分:0)


sed -i \"s|safe_to_bootstrap.*:.*|safe_to_bootstrap:1|1\" /var/lib/mysql/grastate.dat; 
/ --wsrep-new-cluster;



[ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)



使用percona xtradb,docker容器的入口点如下所示:

exec mysqld --user=mysql --wsrep_cluster_name=$CLUSTER_NAME --wsrep_cluster_address="gcomm://$cluster_join" --wsrep_sst_method=xtrabackup-v2 --wsrep_sst_auth="xtrabackup:$XTRABACKUP_PASSWORD" --log-error=${DATADIR}error.log $CMDARG


/ --wsrep-new-cluster

<强> PS // 我一开始尝试了上面的内容,但是我遇到了一个错误,说明要强制使用该节点强制新的集群和引导程序,我必须在safe_to_bootstrap
