在AWS CodeDeploy期间从HAProxy中删除实例

时间:2017-05-15 18:10:34

标签: bash amazon-web-services haproxy aws-code-deploy

我们的应用程序需要使用HAProxy来对流量进行负载平衡和路由(每个AZ一个),ALB和ELB的配置不足以满足我们的需求。通过AWS CodeDeploy部署新代码时,我们希望将要修补的实例置于维护模式(从负载平衡中移除,连接耗尽)。我们修改了默认的CodeDeploy生命周期bash脚本,通过从相关实例向HAProxy发送SSM Run命令,从各自的HAProxy实例中删除实例。目前这种修改不起作用,失败的原因未知。该脚本在逐步手动执行时起作用(至少到当前的故障点)。失败的部分是返回“$ INSTANCE_ID似乎不在具有HAProxy实例的AZ中,跳过注销”的测试,或者上述测试所依赖的$ HAPROXY_ID的设置。该脚本在此之前运行正常,但此时退出,因为它无法找到HAProxy实例ID。

我已经检查了所有看似正确的IAM角色权限/凭据,环境变量和文件权限。通常情况下,我会将更多的日志记录放入脚本中进行调试,但是为了让我们能够实现这一点,部署的时间太少了。

我的问题:有更好的方法吗?我只能猜测我们不是唯一一个在CodeDeploy中使用HAProxy的人,并且必须有一个可靠的方法来实现这一点。以下是当前使用的代码无效。

#!/bin/bash
#
# Copyright 2014 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
#  http://aws.amazon.com/apache2.0
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.

. $(dirname $0)/common_functions.sh

if [[ "$DEPLOYMENT_GROUP_NAME" != "redacted" ]]; then
  msg "ELB Deregistration doesn't need to happen when not on redacted."
  exit
fi

msg "Running AWS CLI with region: $(get_instance_region)"

# get this instance's ID
INSTANCE_ID=$(get_instance_id)
if [ $? != 0 -o -z "$INSTANCE_ID" ]; then
    error_exit "Unable to get this instance's ID; cannot continue."
fi

# Get current time
msg "Started $(basename $0) at $(/bin/date "+%F %T")"
start_sec=$(/bin/date +%s.%N)

msg "Checking if instance $INSTANCE_ID is part of an AutoScaling group"
asg=$(autoscaling_group_name $INSTANCE_ID)
if [ $? == 0 -a -n "${asg}" ]; then
    msg "Found AutoScaling group for instance $INSTANCE_ID: ${asg}"

    msg "Checking that installed CLI version is at least at version required for AutoScaling Standby"
    check_cli_version
    if [ $? != 0 ]; then
        error_exit "CLI must be at least version ${MIN_CLI_X}.${MIN_CLI_Y}.${MIN_CLI_Z} to work with AutoScaling Standby"
    fi

    msg "Attempting to put instance into Standby"
    autoscaling_enter_standby $INSTANCE_ID "${asg}"
    if [ $? != 0 ]; then
        error_exit "Failed to move instance into standby"
    else
        msg "Instance is in standby"
    fi
fi

msg "Instance is not part of an ASG, continuing..."

## Get the instanceID of the HAProxy instance in this AZ and ENVIRONMENT - Will there ever be more than one???

HAPROXY_ID=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text  | \
grep INSTANCES | \
awk '{print $8}' )

HAPROXY_IP=$(/usr/local/bin/aws ec2 describe-instances --region us-east-1 --filters "Name=availability-zone,Values=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)" "Name=tag:deployment_group,Values=haproxy.$ENVIRONMENT" --output text  | \
grep INSTANCES | \
awk '{print $13}' )

if test -z "$HAPROXY_ID"; then
    msg "$INSTANCE_ID doesn't seem to be in an AZ with a HAProxy instance, skipping deregistration."
    exit
fi

## Put the current instance into MAINT mode with the HAProxy instance via SSM

msg "Deregistering $INSTANCE_ID from HAProxy $HAPROXY_ID"

DEREGCMD="{\"commands\":[\"haproxyctl disable server bk_app_servers/$INSTANCEID\"],\"executionTimeout\":[\"3600\"]}"

/usr/local/bin/aws ssm send-command \
--document-name "AWS-RunShellScript" \
--instance-ids "$HAPROXY_ID" \
--parameters "$DEREGCMD" \
--timeout-seconds 600 \
--output-s3-bucket-name "redacted" \
--output-s3-key-prefix "haproxy-codedeploy/deregister" \
--region us-east-1

if [ $? != 0 ]; then
    error_exit "Failed to send SSM command to deregister instance $INSTANCE_ID from HAProxy $HAPROXY_ID"
fi

## Wait for all connections to drain from instance

SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}')
DRAIN_TIME=60

msg "Initial session count: $SESS_COUNT"

while [[ "$SESS_COUNT" -gt 0 ]]; do
    if [[ "$COUNTER" -gt "$DRAIN_TIME" ]]; then
        msg "Instance failed to drain all connections within $DRAIN_TIME seconds. Continuing to deploy anyway."
        break
    fi
    msg $SESS_COUNT
    sleep 1
    COUNTER=$(($COUNTER + 1))
    SESS_COUNT=$(/usr/bin/curl -s "http://$HAPROXY_IP:<portredacted>/<urlredacted>" | grep $INSTANCEID | awk -F "," '{print $5}')
done

msg "Finished $(basename $0) at $(/bin/date "+%F %T")"

end_sec=$(/bin/date +%s.%N)
elapsed_seconds=$(echo "$end_sec - $start_sec" | /usr/bin/bc)

msg "Elapsed time: $elapsed_seconds"

1 个答案:

答案 0 :(得分:0)

目前,唯一的选择是添加更多日志记录并发出部署以测试此脚本,然后查看部署日志。听起来你不知道它为什么会失败,只有日志可以告诉你。

尝试添加日志记录并查看发生的情况。我们应该按原样执行您的脚本,因此不应该采取任何不同的行动,但如果没有看到日志就很难判断。

祝你好运, -Asaf

相关问题