Kubernetes Pod因网络错误而崩溃

时间:2019-06-29 13:34:19

标签: kubernetes gcloud

这件事本周已经发生过两次,在广告连播说明中,我明白了

  Type     Reason           Age              From                                                   Message
  ----     ------           ----             ----                                                   -------
  Warning  NetworkNotReady  2m (x3 over 2m)  kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr  network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
  Normal   SandboxChanged   46s              kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr  Pod sandbox changed, it will be killed and re-created.

我想解释一下正在发生的事情, 一切正常,突然之间我添加了节点描述

 Type     Reason      Age   From                                                          Message
  ----     ------      ----  ----                                                          -------
  Warning  OOMKilling  44m   kernel-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr  Memory cgroup out of memory: Kill process 1560920 (runc:[2:INIT]) score 0 or sacrifice child
Killed process 1560920 (runc:[2:INIT]) total-vm:131144kB, anon-rss:2856kB, file-rss:5564kB, shmem-rss:0kB
  Warning  TaskHung                   31m                kernel-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr   INFO: task dockerd:1883293 blocked for more than 300 seconds.
  Normal   NodeAllocatableEnforced    30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientDisk      30m (x2 over 30m)  kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasSufficientDisk
  Normal   NodeHasSufficientMemory    30m (x2 over 30m)  kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure      30m (x2 over 30m)  kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID       30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasSufficientPID
  Warning  Rebooted                   30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr has been rebooted, boot id: ecd3db95-4bfc-4df5-85b3-70df05f6fb48
  Normal   Starting                   30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Starting kubelet.
  Normal   NodeNotReady               30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeNotReady
  Normal   NodeReady                  30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeReady
  Normal   Starting                   29m                kube-proxy, gke-iagree-cluster-1-main-pool-5632d628-wgzr       Starting kube-proxy.
  Normal   FrequentKubeletRestart     25m                systemd-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr  Node condition FrequentKubeletRestart is now: False, reason: FrequentKubeletRestart
  Normal   CorruptDockerOverlay2      25m                docker-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr   Node condition CorruptDockerOverlay2 is now: False, reason: CorruptDockerOverlay2
  Normal   UnregisterNetDevice        25m                kernel-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr   Node condition FrequentUnregisterNetDevice is now: False, reason: UnregisterNetDevice
  Normal   FrequentDockerRestart      25m                systemd-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr  Node condition FrequentDockerRestart is now: False, reason: FrequentDockerRestart
  Normal   FrequentContainerdRestart  25m                systemd-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr  Node condition FrequentContainerdRestart is now: False, reason: FrequentContainerdRestart

2 个答案:

答案 0 :(得分:1)

看到错误后,看来您的CNI中的IP用尽了。在设置用于网络的kubenet CNI时,您必须已通过CIDR范围,该范围决定了Pod集群中可分配IP的数量。

我不确定kubenet,如果使用自己的虚拟网络,它如何将IP映射到Pod,您需要使用更大的CIDR范围,如果要从主机网络接口获取IP,则需要选择具有以下模式的机器:子网接口(这是AWS VPC CNI的工作方式)。

答案 1 :(得分:1)

由于以下问题,这些错误可能在GKE上的 1.11.x 中出现:gke-issue

可以通过将GKE群集和节点升级到版本 1.12.5-gke.5 1.12.7-gke.10 来解决此问题。