Kubernetes部署导致停机

时间:2016-05-02 19:28:09

标签: deployment kubernetes kubernetes-health-check

运行部署时,我会遇到停机时间。请求在可变时间(20-40秒)后失败。

当preStop发送SIGUSR1,等待31秒,然后发送SIGTERM时,条目容器的准备情况检查失败。在该时间范围内,应该从服务中删除pod,因为准备检查在2次尝试失败后设置为失败,间隔为5秒。

如何查看正在添加和删除服务的pod的事件,以找出导致此问题的原因?

准备就绪的事件会自行检查吗?

我使用Google容器引擎版本1.2.2并使用GCE的网络负载均衡器。

服务

apiVersion: v1
kind: Service
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    targetPort: http
    protocol: TCP
  - name: https
    port: 443
    targetPort: https
    protocol: TCP  
  selector:
    app: myapp

部署:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
        version: 1.0.0-61--66-6
    spec:
      containers:
      - name: myapp
        image: ****  
        resources:
          limits:
            cpu: 100m
            memory: 250Mi
          requests:
            cpu: 10m
            memory: 125Mi
        ports:
        - name: http-direct
          containerPort: 5000
        livenessProbe:
          httpGet:
            path: /status
            port: 5000
          initialDelaySeconds: 30
          timeoutSeconds: 1
        lifecycle:
          preStop:
            exec:
              # SIGTERM triggers a quick exit; gracefully terminate instead
              command: ["sleep 31;"]
      - name: haproxy
        image: travix/haproxy:1.6.2-r0
        imagePullPolicy: Always
        resources:
          limits:
            cpu: 100m
            memory: 100Mi
          requests:
            cpu: 10m
            memory: 25Mi
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443
        env:
        - name: "SSL_CERTIFICATE_NAME"
          value: "ssl.pem"         
        - name: "OFFLOAD_TO_PORT"
          value: "5000"
        - name: "HEALT_CHECK_PATH"
          value: "/status"
        volumeMounts:
        - name: ssl-certificate
          mountPath: /etc/ssl/private
        livenessProbe:
          httpGet:
            path: /status
            port: 443
            scheme: HTTPS
          initialDelaySeconds: 30
          timeoutSeconds: 1
        readinessProbe:
          httpGet:
            path: /readiness
            port: 81
          initialDelaySeconds: 0
          timeoutSeconds: 1
          periodSeconds: 5
          successThreshold: 1
          failureThreshold: 2
        lifecycle:
          preStop:
            exec:
              # SIGTERM triggers a quick exit; gracefully terminate instead
              command: ["kill -USR1 1; sleep 31; kill 1"]
      volumes:
      - name: ssl-certificate
        secret:
          secretName: ssl-c324c2a587ee-20160331

1 个答案:

答案 0 :(得分:1)

当探测失败时,探测器将发出警告事件,其原因为Unhealthy,消息为xx probe errored: xxx

您应该可以使用kubectl get eventskubectl describe pods -l app=myapp,version=1.0.0-61--66-6(按标签过滤广告连播)来查找这些事件。