elasticsearch copy_to字段与聚合的行为不符合预期

时间:2015-07-22 04:34:44

标签: elasticsearch

我有一个带有两个字符串字段constexpr#define的索引映射,两者都被声明为copy_to到另一个名为field1的字段。 field2被编入索引为“not_analyzed”。

当我在all_fields上创建一个桶聚合时,我期待不同的桶,其中field1和field2的键连接在一起。相反,我得到了单独的桶,其中field1和field2的键是非连接的。

实施例: 映射:

all_fields

数据:

all_fields

  {
    "mappings": {
      "myobject": {
        "properties": {
          "field1": {
            "type": "string",
            "index": "analyzed",
            "copy_to": "all_fields"
          },
          "field2": {
            "type": "string",
            "index": "analyzed",
            "copy_to": "all_fields"
          },
          "all_fields": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }
    }
  }

聚合:

  {
    "field1": "dinner carrot potato broccoli",
    "field2": "something here",
  }

结果:

  {
    "field1": "fish chicken something",
    "field2": "dinner",
  }

我只期待2个存储桶,{ "aggs": { "t": { "terms": { "field": "all_fields" } } } } ... "aggregations": { "t": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "dinner", "doc_count": 1 }, { "key": "dinner carrot potato broccoli", "doc_count": 1 }, { "key": "fish chicken something", "doc_count": 1 }, { "key": "something here", "doc_count": 1 } ] } }

我做错了什么?

1 个答案:

答案 0 :(得分:2)

您正在寻找的是两个字符串的连接。 copy_to即使看起来这样做,也不是。对于copy_to,您在概念上是从field1field2创建一组值,而不是连接它们。

对于您的用例,您有两种选择:

  1. 使用_source transformation
  2. 执行脚本聚合
  3. 我建议_source转换,因为我认为它比编写脚本效率更高。这意味着,您在索引时付出的代价要比执行繁重的脚本聚合花费一点。

    _source转化

    PUT /lastseen
    {
      "mappings": {
        "test": {
          "transform": {
            "script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']"
          }, 
          "properties": {
            "field1": {
              "type": "string"
            },
            "field2": {
              "type": "string"
            },
            "lastseen": {
              "type": "long"
            },
            "all_fields": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
    

    查询:

    GET /lastseen/test/_search
    {
      "aggs": {
        "NAME": {
          "terms": {
            "field": "all_fields",
            "size": 10
          }
        }
      }
    }
    

    对于脚本聚合,更容易做(意思是,使用doc['field'].value而不是更昂贵的_source.field)将.raw子字段添加到{ {1}}和field1

    field2

    脚本将使用这些PUT /lastseen { "mappings": { "test": { "properties": { "field1": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "field2": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "lastseen": { "type": "long" } } } } } 子字段:

    .raw

    如果没有{ "aggs": { "NAME": { "terms": { "script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value", "size": 10, "lang": "groovy" } } } } 子字段(有意为.raw制作),您可能需要执行此类操作,这样做会更昂贵:

    not_analyzed