telegraf - exec插件 - aws ec2 ebs volumen info - 度量解析错误,原因:[缺少字段]或遇到错误:[无效数字]

时间:2017-03-10 01:04:56

标签: python-2.7 telegraf telegraf-inputs-plugin telegraf-plugins wavefront-telegraf

计算机 - CentOS 7.2 Ubuntu 14.04 / 16.xx

Telegraf 版本:1.0.1

Python 版本:2.7.5

Telegraf支持名为exec的INPUT插件。首先,请参阅 README doc中的示例2 。我不能使用JSON格式,因为它只使用指标的数字值。根据文档:

If using JSON, only numeric values are parsed and turned into floats. Booleans and strings will be ignored.

所以,这个想法很简单,你在exec插件部分指定一个脚本,它应该吐出一些有意义的信息(以 JSON - 或 - 涌入数据格式< strong>在我的情况下,因为我有一些包含非数字值的指标),您希望在酷炫的仪表板中捕获/显示某些内容,例如此处显示的 Wavefront Dashboard :  Wavefront

基本上,人们可以使用这些指标,标签,来自这些指标来源的来源,找出有关内存,CPU,磁盘,网络,其他有意义信息的各种信息,并在发生不需要的事情时使用这些信息创建警报。

好的,我想出了这个python脚本:

#!/usr/bin/python

# sudo pip install boto3 if you don't have it on your machine.
import boto3


def generate(key, value):
    """
    Creates a nicely formatted Key(Value) item for output
    """
    return '{}="{}"'.format(key, value)
    #return '{}={}'.format(key, value)


def main():
    ec2 = boto3.resource('ec2', region_name="us-west-2")
    volumes = ec2.volumes.all()

    for vol in volumes:
        # You don't need to wrap everything in `str` unless it is not a string
        # By default most things will come back as a string 
        # unless they are very obviously not (complex, date time, etc)
        # but since we are printing these (and formatting them into strings)
        # the cast to string will be implicit and we don't need to make it 
        # explicit


        # vol is already a fully returned volume you are essentially DOUBLING
        # your API calls when you do this
        #iv = ec2.Volume(vol.id)
        output_parts = [
            # Volume level details
            generate('create_time', vol.create_time),
            generate('availability_zone', vol.availability_zone),
            generate('volume_id', vol.volume_id),
            generate('volume_type', vol.volume_type),
            generate('state', vol.state),
            generate('size', vol.size),
            generate('iops', vol.iops),
            generate('encrypted', vol.encrypted),
            generate('snapshot_id', vol.snapshot_id),
            generate('kms_key_id', vol.kms_key_id),
        ]

        for _ in vol.attachments:
            # Will get any attachments and since it is a list
            # we should write this to handle MULTIPLE attachments
            output_parts.extend([
                generate('InstanceId', _.get('InstanceId')),
                generate('InstanceVolumeState', _.get('State')),
                generate('DeleteOnTermination', _.get('DeleteOnTermination')),
                generate('Device', _.get('Device')),
            ])

        # only process when there are tags to process        
        if vol.tags:
            for _ in vol.tags:
                # Get all of the tags
                output_parts.extend([
                    generate(_.get('Key'), _.get('Value')),
                ])

        # output everything at once.. 
        print ','.join(output_parts)


if __name__ == '__main__':
    main()

此脚本将与AWS EC2 EBS卷对话并输出它可以找到的所有值(通常是您在AWS EC2 EBS卷控制台中看到的)并将该信息格式化为有意义的CSV格式,我将其重定向到.csv日志文件。 我们不希望一直运行python脚本(AWS API限制/成本因素)。

所以,一旦创建了.csv文件,我创建了这个小的shell脚本,我将在 Telegraf的exec插件的部分中设置。

Telegraf exec插件中设置的

Shell脚本 /tmp/aws-vol-info.sh是:

#!/bin/bash

cat /tmp/aws-vol-info.csv

使用exec插件(/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf)创建的Telegraf配置文件:

#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec

[[inputs.exec]]
  commands = ["/tmp/aws-vol-info.sh"]

  ## Timeout for each command to complete.
  timeout = "5s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  name_suffix = "_telegraf_execplugin"

调整 .py(生成函数的Python脚本)以生成以下三种类型的输出格式(.csv文件)并希望测试在启用配置文件( /etc/telegraf/telegraf.d/catch-aws-ebs-info.conf )并重新启动telegraf之前,telegraf将如何处理此数据服务。


格式1:(每个值包含双引号"

create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"

在telegraf目录上测试telegraf配置会给我以下错误。

命令$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:37:48 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:37:48Z E! Errors encountered: [ metric parsing error, reason: [invalid field format], buffer: [create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"], index: [372]]
[vagrant@myvagrant ~] $

格式2:(没有任何"双引号)

create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app
在测试Telegraf的exec插件配置时,

获得相同的错误

2017/03/10 00:45:01 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:45:01Z E! Errors encountered: [ metric parsing error, reason: [invalid value], buffer: [create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [63]]

格式3:(此格式在值中没有任何"双引号和空格字符。具有_字符的替换空格。

create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app

仍然无效,得到同样的错误:

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:50:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:50:30Z E! Errors encountered: [ metric parsing error, reason: [missing fields], buffer: [create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [476]]

格式4 :如果我按照此页面关注涌入线协议https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/

awsebs,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1

我收到此错误

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 02:34:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T02:34:30Z E! Errors encountered: [ invalid number]

如何我可以摆脱这个错误并让telegraf与exec插件(运行.sh脚本)一起工作吗?

其他信息

Python脚本每天运行一次/两次(通过cron),telegraf将每1分钟运行一次(运行exec插件 - 运行.sh脚本 - 这将捕获.csv文件,以便telegraf可以在< em>涌入数据格式)。

https://galaxy.ansible.com/wavefrontHQ/wavefront-ansible/

https://github.com/influxdata/telegraf/issues/2525

1 个答案:

答案 0 :(得分:3)

看起来规则非常严格,我应该仔细看看。

您可以使用的任何程序的输出语法必须匹配或遵循下面显示的 INFLUX LINE PROTOCOL 格式以及随附的所有 RULES 。< / p>

例如:

weather,location=us-midwest temperature=82 1465839830100400200
  |    -------------------- --------------  |
  |             |             |             |
  |             |             |             |
+-----------+--------+-+---------+-+---------+
|measurement|,tag_set| |field_set| |timestamp|
+-----------+--------+-+---------+-+---------+

您可以在此处详细了解测量,标记,字段和可选(时间戳)的内容:https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/

重要规则

1)测量和标记集之间必须有,且没有空格。

2)标签集和字段集之间必须有个空格。

3)对于标记键,标记值和字段键,如果要转义测量名称,标记或字段集名称及其值中的任何字符,则始终使用反斜杠字符\来转义!

4)您无法使用\

逃脱\

5)Line Protocol处理表情符号没有问题:)

6)可选

中的TAG / TAG集(标记为逗号分隔)

7)FIELD / FIELD集(字段,逗号分隔) - 每行至少需要一个

8) TIMESTAMP (格式中显示的最后一个值)为可选



9)非常重要的引用规则如下:

a)从不 双重或单引号 时间戳。它不是有效的线路协议。 &#39; 123123131312313&#39;或&#34; 1231313213131&#34;如果#有效,那就不会工作。

b)从不 单引号 字段值(即使它们是字符串!)。它也不是有效的线路协议。即fieldname =&#39; giga&#39;不会工作。

c) 双重或单引号 测量名称,标记键标记值字段键注意:这确实说!!!标签值!!!!好小心

d)不要 双引号 字段值,它们只有浮点数,整数或布尔格式,否则InfluxDB会假设那些值是字符串。

e)字符串双引号 字段值

f)和最重要的一个(将使您免于 BALD ):如果设置了没有双引号的FIELD值/ 即你认为它在一行中是一个整数值或浮点数(例如:任何人都会说字段大小 iops )和其他一些行(如果您设置了非整数值(即字符串),那么telegraf将使用 exec插件读取/解析文件中的任何位置,然后您将收到以下错误消息遇到错误:[无效数字错误。

所以要修复它, RULE 如果FIELD键的任何可能的FIELD值字符串,那么你必须确保使用"来包装它(在每一行中),在某些行中它是否具有值 1,200或1.5 并不重要(例如:iops可以是15),而在某些其他行中,值(iops可以是None)。

错误讯息: Errors encountered: [ invalid number

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 11:13:18 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T11:13:18Z E! Errors encountered: [ invalid number metric parsing error, reason: [invalid field format], buffer: [awsebsvol,host=myvagrant ], index: [25]]

所以,经过所有这些学习之后,我很清楚,首先我错过了Influx Line协议格式,而且 RULES !!

现在,我希望我的python脚本生成的输出应该是这样的(根据INFLUX LINE协议)。您只需更改.sh文件并使用sed "s/^/awsec2ebs,/"sed "s/^/awsec2ebs,sourcehost=$(hostname) /"(注意:关闭sed /字符前的空格),然后您可以"周围任何键=值对。我确实将.py文件更改为"size字段不使用iops

无论如何,如果输出是这样的:

awsec2ebs,volume_id=vol-058e1d47dgh721121 create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"

在上面的最终工作解决方案中,我创建了一个名为awsec2ebs的测量,然后在此测量和标记键,之间提供了volume_id,对于标记值,我没有使用任何{{} 1}}或'引号然后我给了一个"空格字符(因为我现在只想要一个标记,否则你可以使用命令分隔方式和遵循规则使用更多标记)标记集之间和字段集。

最后运行命令

就像一个神子!

$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec

在上面的示例中,2017/03/10 03:33:54 I! Using config file: /etc/telegraf/telegraf.conf * Plugin: inputs.exec, Collection 1 > awsec2ebs_telegraf_execplugin,volume_id=vol-058e1d47dgh721121,host=myvagrant volume_type="gp2",iops="100",kms_key_id="None",role="app",size="8",encrypted="False",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",Name="[company-2b-app90] secondary",snapshot_id="snap-06h1h1b91bh662avn",DeleteOnTermination="True",mirror="secondary",cluster="company",autoscale="true",high_availability="1",create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",state="in-use",Device="/dev/sda1",hostname="company-2b-app90-i-0jjb1boop26f42f50" 1489116835000000000 [vagrant@myvagrant ~] $ echo $? 0 是唯一一个始终为数字/数字值的字段,因此我们不需要将其与size一起包装,但它不是由你决定。回想一下上面的MOST IMPORTANT规则以及它产生的错误。

所以最终的python文件是:

"

最终的aws-vol-info.sh是:

#!/usr/bin/python

#Do `sudo pip install boto3` first
import boto3

def generate(key, value, qs, qe):
    """
    Creates a nicely formatted Key(Value) item for output
    """
    return '{}={}{}{}'.format(key, qs, value, qe)

def main():
    ec2 = boto3.resource('ec2', region_name="us-west-2")
    volumes = ec2.volumes.all()

    for vol in volumes:
        # You don't need to wrap everything in `str` unless it is not a string
        # By default most things will come back as a string
        # unless they are very obviously not (complex, date time, etc)
        # but since we are printing these (and formatting them into strings)
        # the cast to string will be implicit and we don't need to make it
        # explicit

        # vol is already a fully returned volume you are essentially DOUBLING
        # your API calls when you do this
        #iv = ec2.Volume(vol.id)
        output_parts = [
            # Volume level details
            generate('volume_id', vol.volume_id, '"', '"'),
            generate('create_time', vol.create_time, '"', '"'),
            generate('availability_zone', vol.availability_zone, '"', '"'),
            generate('volume_type', vol.volume_type, '"', '"'),
            generate('state', vol.state, '"', '"'),
            generate('size', vol.size, '', ''),
            #The following vol.iops variable can be a number or None so you must wrap it with double quotes otherwise "invalid number" error will come.
            generate('iops', vol.iops, '"', '"'),
            generate('encrypted', vol.encrypted, '"', '"'),
            generate('snapshot_id', vol.snapshot_id, '"', '"'),
            generate('kms_key_id', vol.kms_key_id, '"', '"'),
        ]

        for _ in vol.attachments:
            # Will get any attachments and since it is a list
            # we should write this to handle MULTIPLE attachments
            output_parts.extend([
                generate('InstanceId', _.get('InstanceId'), '"', '"'),
                generate('InstanceVolumeState', _.get('State'), '"', '"'),
                generate('DeleteOnTermination', _.get('DeleteOnTermination'), '"', '"'),
                generate('Device', _.get('Device'), '"', '"'),
            ])

        # only process when there are tags to process
        if vol.tags:
            for _ in vol.tags:
                # Get all of the tags
                output_parts.extend([
                    generate(_.get('Key'), _.get('Value'), '"', '"'),
                ])

        # output everything at once..
        print ','.join(output_parts)

if __name__ == '__main__':
    main()

最终的telegraf exec插件配置文件#!/bin/bash cat aws-vol-info.csv | sed "s/^/awsebsvol,host=`hostname|head -1|sed "s/[ \t][ \t]*/_/g"` /" )为.conf提供任何名称:

/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf

运行,现在一切正常!

#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec

[[inputs.exec]]
  commands = ["/some/valid/path/where/csvfileexists/aws-vol-info.sh"]

  ## Timeout for each command to complete.
  timeout = "5s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  name_suffix = "_telegraf_exec"
相关问题