Question

是否可以使用logstash将xml转换为对象数组？

那是我的样本文件：

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "Metadata" : "<root><Tags><TagTypeID>1</TagTypeID><TagValue>twitter</TagValue></Tags><Tags><TagTypeID>1</TagTypeID><TagValue>facebook</TagValue></Tags><Tags><TagTypeID>2</TagTypeID><TagValue>usa</TagValue></Tags><Tags><TagTypeID>3</TagTypeID><TagValue>smartphones</TagValue></Tags></root>"
}

理想情况下，我想输出这个：

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "Metadata" : [
    {
      "TagTypeID" : "1",
      "TagValue" : "twitter"
    },
    {
      "TagTypeID" : "1",
      "TagValue" : "facebook"
    },
    {
      "TagTypeID" : "2",
      "TagValue" : "usa"
    },
    {
      "TagTypeID" : "3",
      "TagValue" : "smartphones"
    }
  ]
}

然而，我无法做到这一点。我尝试使用xml过滤器：

xml
{
    source => "Metadata"
    target => "Parsed"
}

然而，它输出了这个

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "@version" : "1",
  "@timestamp" : "2015-10-27T17:21:31.961Z",
  "Parsed" : {
    "Tags" : [
      {
        "TagTypeID" : ["1"],
        "TagValue" : ["twitter"]
      },
      {
        "TagTypeID" : ["1"],
        "TagValue" : ["facebook"]
      },
      {
        "TagTypeID" : ["2"],
        "TagValue" : ["usa"]
      },
      {
        "TagTypeID" : ["3"],
        "TagValue" : ["smartphones"]
      }
    ]
  }
}

我不希望我的值存储为数组（我知道那里总是只有一个值）。

我知道哪些字段会从我的输入中恢复，所以我可以自己映射结构，这不需要是动态的（虽然这样会很好）。

Allow splitting of lists / arrays into multiple events似乎很有用，但文档记录很差，我无法找到有关如何将此过滤器用于我的用例的信息。

Logstash, split event from an xml file in multiples documents keeping information from root tags类似，但不完全是我想要实现的目标。

Logstash: XML to JSON output from array to string这似乎很有用，但它硬编码数组的第一个元素必须作为单个项目（不是数组的一部分）输出。它让我想起了这个：

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "@version" : "1",
  "@timestamp" : "2015-10-27T17:21:31.961Z",
  "Parsed" : {
    "Tags" : [
      {
        "TagTypeID" : "1",
        "TagValue" : "twitter"
      },
      {
        "TagTypeID" : ["1"],
        "TagValue" : ["facebook"]
      },
      {
        "TagTypeID" : ["2"],
        "TagValue" : ["usa"]
      },
      {
        "TagTypeID" : ["3"],
        "TagValue" : ["smartphones"]
      }
    ]
  }
}

这可以在不创建自定义过滤器的情况下完成吗？（我没有 Ruby的经验）
或者我错过了一些基本的东西？

Answer 1

以下是使用logstash内置ruby filter的一种方法。

过滤部分：

filter {
    xml {
        source => "Metadata"
        target => "Parsed"
    }

    ruby {  code => "
        event['Parsed']['Tags'].each do |x|
            x.each do |key, value|
                x[key] = value[0]
            end
        end"
    }
}

<强>输出：

"Parsed":{
  "Tags":[
      {
      "TagTypeID":"1",
      "TagValue":"twitter"
      },
      {
      "TagTypeID":"1",
      "TagValue":"facebook"
      },
      {
      "TagTypeID":"2",
      "TagValue":"usa"
      },
      {
      "TagTypeID":"3",
      "TagValue":"smartphones"
      }
  ]
}

如果我理解正确，这是你想要的结果。您需要在ruby过滤器中指定xml字段：event['Parsed']['Tags']。它需要更有活力吗？如果您还有其他需要，请告诉我。

这可以在不创建自定义过滤器的情况下完成吗？（我没有Ruby经验）

嗯，是的，不是。是的，因为这不是一个真正的自定义过滤器，而是一个内置的解决方案。不，因为我倾向于说没有Ruby就无法做到这一点。我必须承认Ruby似乎是一个没有吸引力的解决方案。但是，这是一种灵活的方法，5行代码不应该受到太大影响。

Answer 2

最新的Logstash版本（此时为5.1.1）已更新XML过滤器，其中包含force_array选项。它默认启用。将此设置为false将与接受的答案中的ruby过滤器完全相同。

取自文件：

force_contentedit


值类型为boolean

默认值为false


默认情况下，过滤器会以不同于标记内容的方式扩展属性。此选项允许您强制文本内容和属性始终解析为哈希值。

https://www.elastic.co/guide/en/logstash/current/plugins-filters-xml.html#plugins-filters-xml-force_array

Logstash将xml拆分为数组

2 个答案:

`force_contentedit`