metrics-server pod应该在主节点还是工作节点上运行?

时间:2020-08-07 14:31:19

标签: kubernetes kubernetes-dashboard kubernetes-metrics

我是k8的新手,我试图在主节点上部署仪表板,并且部分部署是启动metrics-server。完整的文档可以在这里找到(dashboard / metrics-server)。

我的问题与部署后立即可以看到的警告有关:

$ kubectl describe pods -n kube-system metrics-server-74d7f54fdc-psz5p
Name:           metrics-server-74d7f54fdc-psz5p
Namespace:      kube-system
Priority:       0
Node:           <none>
Labels:         k8s-app=metrics-server
                pod-template-hash=74d7f54fdc
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/metrics-server-74d7f54fdc
Containers:
  metrics-server:
    Image:      my.repo.net/k8s.gcr.io/metrics-server-amd64:v0.3.6
    Port:       4443/TCP
    Host Port:  0/TCP
    Args:
      --cert-dir=/tmp
      --secure-port=4443
    Environment:  <none>
    Mounts:
      /tmp from tmp-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-d47dm (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  metrics-server-token-d47dm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  metrics-server-token-d47dm
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/arch=amd64
                 kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  116s (x49 over 66m)  default-scheduler  0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.

阅读其他问题后,例如Node had taints that the pod didn't tolerate error when deploying to Kubernetes cluster1 node(s) had taints that the pod didn't tolerate in kubernetes cluster我可以理解为什么会出现此问题,但是对于是否应该在图像上添加这种关联,我感到困惑。 (https://github.com/kubernetes-sigs/metrics-server/releases/tag/v0.3.7):

tolerations:
  - key: "example-key"
    operator: "Exists"
    effect: "NoSchedule"

如果主节点应能够自行收集度量标准,则默认情况下不应该添加此参数吗?如果没有,那么我们应该在所有工作程序上部署UI(这没有任何意义)。

也许对此有更多经验的人可以分享一些启发?

1 个答案:

答案 0 :(得分:4)

可以将度量标准服务器部署到工作程序节点中,并非必须将其部署在主节点中以获取有关主服务器的度量标准。指标服务器使用kubeapi-server来获取有关集群的各种指标,指标服务器的要求是:

  1. 必须从kubeapi-server可以访问
  2. 正确设置Kubelet授权refer this link

集群中是否有工作节点?他们的污点是否适用于这些节点?另外,根据您的部署yaml节点选择器已配置了以下值,请确保您的工作节点具有这2个标签

  • kubernetes.io/arch=amd64
  • kubernetes.io/os=linux

您可以使用以下命令将标签添加到节点(如果不存在)。

kubectl label nodes *node-name* kubernetes.io/arch=amd64