全新安装Kubernetes工作节点永远不会变成“就绪”

时间:2017-04-14 22:00:21

标签: kubernetes coreos

我一直在与kubernetes安装问题作斗争。我们启动了一个新的openstack环境,在旧的失败环境中工作的脚本在新的失败环境中失败。

我们正在使用K8s v1.5.4使用这些脚本:https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/generic

CoreOS 1298.7.0

主人似乎很好。我可以将pod部署到它,在运行x1<-seq(10,20,1) y1<-seq(30,40,1) x2<-seq(0.1,1.1,0.1) y2<-seq(40,50,1) A<-data.frame(x1,y1,x2,y2)

时始终显示ready

工作线程安装脚本会运行,但它永远不会显示kubectl get nodes状态。

ready

如果我运行kubectl get nodes --show-labels NAME STATUS AGE LABELS MYIP.118.240.122 Ready,SchedulingDisabled 7m beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=MYIP.118.240.122 MYIP.118.240.129 NotReady 5m beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=MYIP.118.240.129 ,我会收到以下信息:

kubectl describe node MYIP.118.240.129

所有端口都在工作人员和主人之间的内部网络中打开。

如果我对工人运行(testtest)➜ dev kubectl describe node MYIP.118.240.129 Name: MYIP.118.240.129 Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/hostname=MYIP.118.240.129 Taints: <none> CreationTimestamp: Fri, 14 Apr 2017 15:27:47 -0600 Phase: Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk Unknown Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:28:29 -0600 NodeStatusUnknown Kubelet stopped posting node status. MemoryPressure False Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:27:47 -0600 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:27:47 -0600 KubeletHasNoDiskPressure kubelet has no disk pressure Ready Unknown Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:28:29 -0600 NodeStatusUnknown Kubelet stopped posting node status. Addresses: MYIP.118.240.129,MYIP.118.240.129,MYIP.118.240.129 Capacity: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 1 memory: 2052924Ki pods: 110 Allocatable: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 1 memory: 2052924Ki pods: 110 System Info: Machine ID: efee03ac51c641888MYIP50dfa2a40350d System UUID: 4467C959-37FE-48ED-A263-C36DD0D445F1 Boot ID: 50eb5e93-5aed-441b-b3ef-36da1472e4ea Kernel Version: 4.9.16-coreos-r1 OS Image: Container Linux by CoreOS 1298.7.0 (Ladybug) Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.5.4+coreos.0 Kube-Proxy Version: v1.5.4+coreos.0 ExternalID: MYIP.118.240.129 Non-terminated Pods: (5 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- kube-system heapster-v1.2.0-216693398-sfz1m 50m (5%) 50m (5%) 90Mi (4%) 90Mi (4%) kube-system kube-dns-782804071-psmfc 260m (26%) 0 (0%) 140Mi (6%) 220Mi (10%) kube-system kube-dns-autoscaler-2715466192-jmb3h 20m (2%) 0 (0%) 10Mi (0%) 0 (0%) kube-system kube-proxy-MYIP.118.240.129 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system kubernetes-dashboard-3543765157-w8zv2 100m (10%) 100m (10%) 50Mi (2%) 50Mi (2%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted. CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 430m (43%) 150m (15%) 290Mi (14%) 360Mi (17%) Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 11m 11m 1 {kubelet MYIP.118.240.129} Normal Starting Starting kubelet. 11m 11m 1 {kubelet MYIP.118.240.129} Warning ImageGCFailed unable to find data for container / 11m 11m 2 {kubelet MYIP.118.240.129} Normal NodeHasSufficientDisk Node MYIP.118.240.129 status is now: NodeHasSufficientDisk 11m 11m 2 {kubelet MYIP.118.240.129} Normal NodeHasSufficientMemory Node MYIP.118.240.129 status is now: NodeHasSufficientMemory 11m 11m 2 {kubelet MYIP.118.240.129} Normal NodeHasNoDiskPressure Node MYIP.118.240.129 status is now: NodeHasNoDiskPressure (testtest)➜ dev

docker ps

system_23​​185d6abc4d5c8f11da2ca1943fd398_e8a1c6d6

kubelet在整个周末运行后记录日志:

ID        IMAGE                                      COMMAND                  CREATED             STATUS              PORTS               NAMES
c25cf12b43f3        quay.io/coreos/hyperkube:v1.5.4_coreos.0   "/hyperkube proxy --m"   4 minutes ago       Up 4 minutes                            k8s_kube-proxy.96aded63_kube-proxy-MYIP.118.240.129_kube-system_23185d6abc4d5c8f11da2ca1943fd398_5ba9628a
c4d14dfd7d52        gcr.io/google_containers/pause-amd64:3.0   "/pause"                 6 minutes ago       Up 6 minutes                            k8s_POD.d8dbe16c_kube-proxy-MYIP.118.240.129_kube-

如果您在日志中注意到工作节点无法与主节点通信....

但是,如果我ssh into the worker并运行如下命令:

    Apr 17 20:53:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:53:15.507939    1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:48:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:48:15.484016    1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:43:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:15.405888    1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: W0417 20:43:07.361035    1353 kubelet.go:1497] Deleting mirror pod "kube-proxy-MYIP.118.240.129_kube-system(37537fb7-2159-11e7-b692-fa163e952b1c)" because it is outdated
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.018406    1353 event.go:208] Unable to write event: 'Post https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/events: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer' (may retry after sleeping)
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.017813    1353 reflector.go:188] pkg/kubelet/kubelet.go:386: Failed to list *api.Node: Get https://MYIP.118.240.122:443/api/v1/nodes?fieldSelector=metadata.name%3DMYIP.118.240.129&resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.017711    1353 reflector.go:188] pkg/kubelet/kubelet.go:378: Failed to list *api.Service: Get https://MYIP.118.240.122:443/api/v1/services?resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.016457    1353 kubelet_node_status.go:302] Error updating node status, will retry: error getting node "MYIP.118.240.129": Get https://MYIP.118.240.122:443/api/v1/nodes?fieldSelector=metadata.name%3DMYIP.118.240.129: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.0161MYIP    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/e8ea63b2-2159-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"e8ea63b2-2159-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.016165356 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/e8ea63b2-2159-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "e8ea63b2-2159-11e7-b692-fa163e952b1c" (UID: "e8ea63b2-2159-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.016058    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015943    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec05331e-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec05331e-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.015913703 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec05331e-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec05331e-2158-11e7-b692-fa163e952b1c" (UID: "ec05331e-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015843    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015732    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/e8fdcca4-2159-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"e8fdcca4-2159-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.015656131 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/e8fdcca4-2159-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "e8fdcca4-2159-11e7-b692-fa163e952b1c" (UID: "e8fdcca4-2159-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015559    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015429    1353 reflector.go:188] pkg/kubelet/config/apiserver.go:44: Failed to list *api.Pod: Get https://MYIP.118.240.122:443/api/v1/pods?fieldSelector=spec.nodeName%3DMYIP.118.240.129&resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012918    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec091be8-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec091be8-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.012889039 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec091be8-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec091be8-2158-11e7-b692-fa163e952b1c" (UID: "ec091be8-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012820    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012661    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec09da25-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec09da25-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.012630687 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec09da25-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec09da25-2158-11e7-b692-fa163e952b1c" (UID: "ec09da25-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer

这是TLS,所以我当然不认为它是认证的。

有关如何调试的建议吗?

谢谢!

2 个答案:

答案 0 :(得分:0)

您需要检查是否在主服务器的SSL生成文件(openssl.cnf)中添加了您的IP地址。 尝试使用您的DNS服务器的IP重新创建您的证书(如果您遵循coreOS它的10.3.0.1)。你的openssl.cnf将如下所示:

 [req]
 req_extensions = v3_req
 distinguished_name = req_distinguished_name
 [req_distinguished_name]
 [ v3_req ]
 basicConstraints = CA:FALSE
 keyUsage = nonRepudiation, digitalSignature, keyEncipherment
 subjectAltName = @alt_names
 [alt_names]
 DNS.1 = kubernetes
 DNS.2 = kubernetes.default
 DNS.3 = kubernetes.default.svc
 DNS.4 = kubernetes.default.svc.cluster.local
 IP.1 = 10.3.0.1
 IP.2 = PRIVATE_MASTER_IP
 IP.3 = PUBLIC_MASTER_IP

您还需要为节点重新创建证书。之后从命名空间中删除秘密以自动重新生成它。 来源CoreOS docs

答案 1 :(得分:0)

事实证明问题是openstack中MTU的网络设置不一致。数据包&gt;丢弃了1500字节左右。