Azure AKS 横向扩展

时间:2021-02-04 08:56:49

标签: azure kubernetes azure-devops azure-aks azure-vm-scale-set

我有 03 个节点的 AKS,我尝试手动从 3 个节点扩展到 4 个节点。放大是好的。 大约 20 分钟后,所有 04 Node 都处于 NotReady Service 状态,所有 kube-system 服务都不是 Ready 状态。

NAME STATUS ROLES AGE VERSION
aks-agentpool-40760006-vmss000000 Ready agent 16m v1.18.14
aks-agentpool-40760006-vmss000001 Ready agent 17m v1.18.14
aks-agentpool-40760006-vmss000002 Ready agent 16m v1.18.14
aks-agentpool-40760006-vmss000003 Ready agent 11m v1.18.14

NAME STATUS ROLES AGE VERSION
aks-agentpool-40760006-vmss000000 NotReady agent 23m v1.18.14
aks-agentpool-40760006-vmss000002 NotReady agent 24m v1.18.14
aks-agentpool-40760006-vmss000003 NotReady agent 19m v1.18.14

k get po -n kube-system
NAME                                  READY   STATUS        RESTARTS   AGE
coredns-748cdb7bf4-7frq2              0/1     Pending       0          10m
coredns-748cdb7bf4-vg5nn              0/1     Pending       0          10m
coredns-748cdb7bf4-wrhxs              1/1     Terminating   0          28m
coredns-autoscaler-868b684fd4-2gb8f   0/1     Pending       0          10m
kube-proxy-p6wmv                      1/1     Running       0          28m
kube-proxy-sksz6                      1/1     Running       0          23m
kube-proxy-vpb2g                      1/1     Running       0          28m
metrics-server-58fdc875d5-sbckj       0/1     Pending       0          10m
tunnelfront-5d74798f6b-w6rvn          0/1     Pending       0          10m

节点日志显示:

Events:
  Type     Reason                   Age                   From              Message
  ----     ------                   ----                  ----              -------
  Normal   Starting                 25m                   kubelet           Starting kubelet.
  Normal   NodeHasSufficientMemory  25m (x2 over 25m)     kubelet           Node aks-agentpool-40760006-vmss000000 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    25m (x2 over 25m)     kubelet           Node aks-agentpool-40760006-vmss000000 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     25m (x2 over 25m)     kubelet           Node aks-agentpool-40760006-vmss000000 status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  25m                   kubelet           Updated Node Allocatable limit across pods
  Normal   Starting                 25m                   kube-proxy        Starting kube-proxy.
  Normal   NodeReady                24m                   kubelet           Node aks-agentpool-40760006-vmss000000 status is now: NodeReady
  Warning  FailedToCreateRoute      5m5s                  route_controller  Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 50.264754ms: timed out waiting for the condition
  Warning  FailedToCreateRoute      4m55s                 route_controller  Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 45.945658ms: timed out waiting for the condition
  Warning  FailedToCreateRoute      4m45s                 route_controller  Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 46.180158ms: timed out waiting for the condition
  Warning  FailedToCreateRoute      4m35s                 route_controller  Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 46.550858ms: timed out waiting for the condition
  Warning  FailedToCreateRoute      4m25s                 route_controller  Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 44.74355ms: timed out waiting for the condition
  Warning  FailedToCreateRoute      4m15s                 route_controller  Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 42.428456ms: timed out waiting for the condition
  Warning  FailedToCreateRoute      4m5s                  route_controller  Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 41.664858ms: timed out waiting for the condition
  Warning  FailedToCreateRoute      3m55s                 route_controller  Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 48.456954ms: timed out waiting for the condition
  Warning  FailedToCreateRoute      3m45s                 route_controller  Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 38.611964ms: timed out waiting for the condition
  Warning  FailedToCreateRoute      65s (x16 over 3m35s)  route_controller  (combined from similar events): Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 13.972487ms: timed out waiting for the condition

2 个答案:

答案 0 :(得分:0)

您可以使用 cluster autoscaler 选项来避免将来出现此类情况。

<块引用>

为了满足 Azure Kubernetes 服务 (AKS) 中的应用需求, 您可能需要调整运行工作负载的节点数。 集群自动缩放器组件可以监视集群中的 pod 由于资源限制而无法安排。当问题 检测到,节点池中的节点数量增加以满足 应用需求。还会定期检查节点是否缺少 运行 Pod,然后根据需要减少节点数量。这 能够自动增加或减少您的节点数量 AKS 集群可让您运行高效、经济的集群。

您可以 Update an existing AKS cluster to enable the cluster autoscaler 以使用您当前的资源组。

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 3

答案 1 :(得分:0)

现在看起来可以了。我没有扩大节点的权利。