coredns连播具有CrashLoopBackOff或错误状态

时间:2018-10-31 03:20:42

标签: docker kubernetes kubectl kubeadm coredns

我正在尝试通过发出以下命令来设置Kubernetes主机:

  

kubeadm初始化--pod-network-cidr = 192.168.0.0 / 16

  1. 其后为:Installing a pod network add-on(Calico)
  2. 其后为:Master Isolation

问题:coredns个窗格的状态为CrashLoopBackOffError

# kubectl get pods -n kube-system
NAME                                       READY   STATUS             RESTARTS   AGE
calico-node-lflwx                          2/2     Running            0          2d
coredns-576cbf47c7-nm7gc                   0/1     CrashLoopBackOff   69         2d
coredns-576cbf47c7-nwcnx                   0/1     CrashLoopBackOff   69         2d
etcd-suey.nknwn.local                      1/1     Running            0          2d
kube-apiserver-suey.nknwn.local            1/1     Running            0          2d
kube-controller-manager-suey.nknwn.local   1/1     Running            0          2d
kube-proxy-xkgdr                           1/1     Running            0          2d
kube-scheduler-suey.nknwn.local            1/1     Running            0          2d
# 

我尝试使用Troubleshooting kubeadm - Kubernetes,但是我的节点未运行SELinux,并且Docker是最新的。

# docker --version
Docker version 18.06.1-ce, build e68fc7a
# 

kubectl的{​​{1}}:

describe

# kubectl -n kube-system describe pod coredns-576cbf47c7-nwcnx Name: coredns-576cbf47c7-nwcnx Namespace: kube-system Priority: 0 PriorityClassName: <none> Node: suey.nknwn.local/192.168.86.81 Start Time: Sun, 28 Oct 2018 22:39:46 -0400 Labels: k8s-app=kube-dns pod-template-hash=576cbf47c7 Annotations: cni.projectcalico.org/podIP: 192.168.0.30/32 Status: Running IP: 192.168.0.30 Controlled By: ReplicaSet/coredns-576cbf47c7 Containers: coredns: Container ID: docker://ec65b8f40c38987961e9ed099dfa2e8bb35699a7f370a2cda0e0d522a0b05e79 Image: k8s.gcr.io/coredns:1.2.2 Image ID: docker-pullable://k8s.gcr.io/coredns@sha256:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Running Started: Wed, 31 Oct 2018 23:28:58 -0400 Last State: Terminated Reason: Error Exit Code: 137 Started: Wed, 31 Oct 2018 23:21:35 -0400 Finished: Wed, 31 Oct 2018 23:23:54 -0400 Ready: True Restart Count: 103 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-xvq8b: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-xvq8b Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: CriticalAddonsOnly node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Killing 54m (x10 over 4h19m) kubelet, suey.nknwn.local Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated. Warning Unhealthy 9m56s (x92 over 4h20m) kubelet, suey.nknwn.local Liveness probe failed: HTTP probe failed with statuscode: 503 Warning BackOff 5m4s (x173 over 4h10m) kubelet, suey.nknwn.local Back-off restarting failed container # kubectl -n kube-system describe pod coredns-576cbf47c7-nm7gc Name: coredns-576cbf47c7-nm7gc Namespace: kube-system Priority: 0 PriorityClassName: <none> Node: suey.nknwn.local/192.168.86.81 Start Time: Sun, 28 Oct 2018 22:39:46 -0400 Labels: k8s-app=kube-dns pod-template-hash=576cbf47c7 Annotations: cni.projectcalico.org/podIP: 192.168.0.31/32 Status: Running IP: 192.168.0.31 Controlled By: ReplicaSet/coredns-576cbf47c7 Containers: coredns: Container ID: docker://0f2db8d89a4c439763e7293698d6a027a109bf556b806d232093300952a84359 Image: k8s.gcr.io/coredns:1.2.2 Image ID: docker-pullable://k8s.gcr.io/coredns@sha256:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Running Started: Wed, 31 Oct 2018 23:29:11 -0400 Last State: Terminated Reason: Error Exit Code: 137 Started: Wed, 31 Oct 2018 23:21:58 -0400 Finished: Wed, 31 Oct 2018 23:24:08 -0400 Ready: True Restart Count: 102 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-xvq8b: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-xvq8b Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: CriticalAddonsOnly node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Killing 44m (x12 over 4h18m) kubelet, suey.nknwn.local Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated. Warning BackOff 4m58s (x170 over 4h9m) kubelet, suey.nknwn.local Back-off restarting failed container Warning Unhealthy 8s (x102 over 4h19m) kubelet, suey.nknwn.local Liveness probe failed: HTTP probe failed with statuscode: 503 # 的{​​{1}}:

kubectl

log# kubectl -n kube-system logs -f coredns-576cbf47c7-nm7gc E1101 03:31:58.974836 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout E1101 03:31:58.974836 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout E1101 03:31:58.974857 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout E1101 03:32:29.975493 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout E1101 03:32:29.976732 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout E1101 03:32:29.977788 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout E1101 03:33:00.976164 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout E1101 03:33:00.977415 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout E1101 03:33:00.978332 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout 2018/11/01 03:33:08 [INFO] SIGTERM: Shutting down servers then terminating E1101 03:33:31.976864 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout E1101 03:33:31.978080 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout E1101 03:33:31.979156 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout # # kubectl -n kube-system log -f coredns-576cbf47c7-gqdgd .:53 2018/11/05 04:04:13 [INFO] CoreDNS-1.2.2 2018/11/05 04:04:13 [INFO] linux/amd64, go1.11, eb51e8b CoreDNS-1.2.2 linux/amd64, go1.11, eb51e8b 2018/11/05 04:04:13 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769 2018/11/05 04:04:19 [FATAL] plugin/loop: Seen "HINFO IN 3597544515206064936.6415437575707023337." more than twice, loop detected # kubectl -n kube-system log -f coredns-576cbf47c7-hhmws .:53 2018/11/05 04:04:18 [INFO] CoreDNS-1.2.2 2018/11/05 04:04:18 [INFO] linux/amd64, go1.11, eb51e8b CoreDNS-1.2.2 linux/amd64, go1.11, eb51e8b 2018/11/05 04:04:18 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769 2018/11/05 04:04:24 [FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected # ):

describe

系统日志(主机):

  

11月4日22:59:36 suey kubelet [1234]:E1104 22:59:36.139538 1234   pod_workers.go:186]同步pod时出错   d8146b7e-de57-11e8-a1e2-ec8eb57434c8   (“ coredns-576cbf47c7-hhmws_kube-system(d8146b7e-de57-11e8-a1e2-ec8eb57434c8)”),正在跳过:对于“ coredns”,无法使用“ StartContainer”   CrashLoopBackOff:“退出40秒后重新启动失败的container = coredns   pod = coredns-576cbf47c7-hhmws_kube-system(d8146b7e-de57-11e8-a1e2-ec8eb57434c8)“

请告知。

4 个答案:

答案 0 :(得分:13)

此错误

[FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected

是当CoreDNS在resolve配置中检测到循环时引起的,这是预期的行为。您遇到了这个问题:

https://github.com/kubernetes/kubeadm/issues/1162

https://github.com/coredns/coredns/issues/2087

棘手的解决方案:禁用CoreDNS循环检测

编辑CoreDNS配置图:

kubectl -n kube-system edit configmap coredns

使用loop删除或注释掉该行,保存并退出。

然后删除CoreDNS吊舱,以便可以使用新的配置创建新的吊舱:

kubectl -n kube-system delete pod -l k8s-app=kube-dns

在那之后一切都很好。

首选解决方案:删除DNS配置中的循环

首先,检查您是否正在使用systemd-resolved。如果您正在运行Ubuntu 18.04,则可能是这种情况。

systemctl list-unit-files | grep enabled | grep systemd-resolved

如果是,请检查您的集群使用哪个resolv.conf文件作为参考:

ps auxww | grep kubelet

您可能会看到类似以下的行:

/usr/bin/kubelet ... --resolv-conf=/run/systemd/resolve/resolv.conf

重要的部分是--resolv-conf-我们确定是否使用了systemd resolv.conf。

如果它是resolv.conf中的systemd,请执行以下操作:

检查/run/systemd/resolve/resolv.conf的内容以查看是否有类似以下记录:

nameserver 127.0.0.1

如果有127.0.0.1,则是引起循环的那个。

要摆脱它,您不应编辑该文件,而应检查其他位置以使其正确生成。

检查/etc/systemd/network下的所有文件,如果找到类似的记录

DNS=127.0.0.1

删除该记录。还要检查/etc/systemd/resolved.conf并根据需要执行相同的操作。确保至少配置了一两个DNS服务器,例如

DNS=1.1.1.1 1.0.0.1

完成所有操作后,重新启动systemd服务以使更改生效: systemctl重新启动systemd-networked systemd-resolved

然后,确认DNS=127.0.0.1文件中不再有resolv.conf

cat /run/systemd/resolve/resolv.conf

最后,触发重新创建DNS容器

kubectl -n kube-system delete pod -l k8s-app=kube-dns

摘要::该解决方案涉及从主机DNS配置中删除看起来像DNS查找循环的内容。不同的resolv.conf管理器/实现之间的步骤有所不同。

答案 1 :(得分:1)

对于使用“无”驱动程序的Ubuntu上的minikube,
您可以使用以下标志使其无需其他更改即可工作-
sudo minikube start --extra-config=kubelet.resolv-conf=/run/systemd/resolve/resolv.conf

参见this相关问题

答案 2 :(得分:0)

这里有一些自动执行Utkuanswer的shell黑客:

# remove loop from DNS config files
sudo find /etc/systemd/network /etc/systemd/resolved.conf -type f \
    -exec sed -i '/^DNS=127.0.0.1/d' {} +

# if necessary, configure some DNS servers (use cloudfare public)
if ! grep '^DNS=.*' /etc/systemd/resolved.conf; then
    sudo sed -i '$aDNS=1.1.1.1 1.0.0.1' /etc/systemd/resolved.conf
fi

# restart systemd services
sudo systemctl restart systemd-networkd systemd-resolved

# force (re-) creation of the dns pods
kubectl -n kube-system delete pod -l k8s-app=kube-dns

答案 3 :(得分:0)

ubuntu 16.04中,您可能会遇到dnsmasq的问题。它会自动设置一个环回地址。我发布了here类似的回复。