进一步过滤聚合

时间:2017-08-04 19:47:59

标签: elasticsearch elasticsearch-aggregation

我对弹性搜索中的聚合有疑问。我有一份如下文件:

{
  "_index": "products",
  "_type": "product",
  "_id": "ID-12345",
  "_score": 1,
  "_source": {
    "created_at": "2017-08-04T17:56:44.592Z",
    "updated_at": "2017-08-04T17:56:44.592Z",
    "product_information": {
      "sku": "12345",
      "name": "Product Name",
      "price": 25,
      "brand": "Brand Name",
      "url": "URL"
    },
    "product_detail": {
      "description": "Product description text here.",
      "string_facets": [
        {
          "facet_name": "Colour",
          "facet_value": "Grey"
        },
        {
          "facet_name": "Category",
          "facet_value": "Linen"
        },
        {
          "facet_name": "Category",
          "facet_value": "Throws & Blanket"
        },
        {
          "facet_name": "Keyword",
          "facet_value": "Contemporary"
        },
        {
          "facet_name": "Keyword",
          "facet_value": "Sophisticated"
        }
      ]
    }
  }
}

我正在product_detail.string_facets字段中存储颜色,材料,类别和关键字等产品信息。我想将此用于聚合以获取颜色/材质/类别/关键字建议,但作为单独的存储桶。即,product_detail.string_facets.facet_name中定义的每个string_facet类型都有一个单独的存储桶。

这是我目前正在返回数据的查询,但并不像我预期的那样。首先是查询(这只是为了尝试获取颜色):

{
  "from": 0,
  "size": 12,
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "Rug",
            "fields": ["product_information.name", "product_detail.string_facets.facet_value"]
          }
        },
        {
          "multi_match": {
            "query": "Blue",
            "fields": ["product_information.name", "product_detail.string_facets.facet_name"]
          }
        }
      ],
      "minimum_should_match": "100%"
    }
  },
  "aggs": {
    "suggestions": {
      "filter": { "term": { "product_detail.string_facets.facet_name.keyword": "Colour" }},
      "aggs": {
        "colours": {
          "terms": {
            "field": "product_detail.string_facets.facet_value.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

这给我输出如下:

"aggregations": {
    "suggestions": {
      "doc_count": 21,
      "colours": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 23,
        "buckets": [
          {
            "key": "Rug",
            "doc_count": 21
          },
          {
            "key": "Blue",
            "doc_count": 18
          },
          {
            "key": "Bold",
            "doc_count": 7
          },
          {
            "key": "Modern",
            "doc_count": 6
          },
          {
            "key": "Multi-Coloured",
            "doc_count": 5
          },
          {
            "key": "Contemporary",
            "doc_count": 4
          },
          {
            "key": "Traditional",
            "doc_count": 4
          },
          {
            "key": "White",
            "doc_count": 4
          },
          {
            "key": "Luxurious",
            "doc_count": 3
          },
          {
            "key": "Minimal",
            "doc_count": 3
          }
        ]
      }
    }
  }

它给了我所有facet_name的结果,而不是我认为的facet_type颜色的结果。

非常感谢任何帮助。 Elasticsearch看起来非常强大,但文档非常令人生畏!

1 个答案:

答案 0 :(得分:0)

您没有显示映射的外观,但我认为product_detail.string_facets字段只是一个内部对象字段,这就是您获得此类结果的原因。通过这种类型的映射,Elasticsearch将数组展平为一个简单的字段名称和值列表。在你的情况下它变成:

{
  "product_detail.string_facets.facet_name": ["Colour", "Category", "Keyword"],
  "product_detail.string_facets.facet_value": ["Grey", "Linen", "Throws & Blanket", "Contemporary", "Sophisticated"]
}

如您所见,基于此结构,Elasticsearch无法知道如何聚合数据。

要使其正常工作,product_detail.string_facets字段应为nested类型。 string_facets的映射应与此类似(注意"type": "nested"):

"string_facets": {
    "type": "nested",
    "properties": {
        "facet_name": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                }
            }
        },
        "facet_value": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                }
            }
        }
    }
}

现在我索引以下文件:

{
    "created_at": "2017-08-04T17:56:44.592Z",
    "updated_at": "2017-08-04T17:56:44.592Z",
    "product_information": {
      "sku": "12345",
      "name": "Rug",
      "price": 25,
      "brand": "Brand Name",
      "url": "URL"
    },
    "product_detail": {
      "description": "Product description text here.",
      "string_facets": [
        {
          "facet_name": "Colour",
          "facet_value": "Blue"
        },
        {
          "facet_name": "Colour",
          "facet_value": "Red"
        },
        {
          "facet_name": "Category",
          "facet_value": "Throws & Blanket"
        },
        {
          "facet_name": "Keyword",
          "facet_value": "Contemporary"
        }
      ]
    }
}

现在,要将颜色建议聚合为单独的存储桶,您可以尝试此查询(我简化了bool query以满足我的文档需要):

{
  "from": 0,
  "size": 12,
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "Rug",
            "fields": ["product_information.name", "product_detail.string_facets.facet_value"]
          }
        }
      ]
    }
  },
  "aggs": {
    "facets": {
        "nested" : {
            "path" : "product_detail.string_facets"
        },
        "aggs": {
            "suggestions": {
              "filter": { "term": { "product_detail.string_facets.facet_name.keyword": "Colour" }},
              "aggs": {
                "colours": {
                  "terms": {
                    "field": "product_detail.string_facets.facet_value.keyword",
                    "size": 10
                  }
                }
              }
            }
        }
      }
    }
}

结果:

{
    ...,
    "hits": {
    ...
    },
    "aggregations": {
        "facets": {
            "doc_count": 5,
            "suggestions": {
                "doc_count": 2,
                "colours": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                        {
                            "key": "Blue",
                            "doc_count": 1
                        },
                        {
                            "key": "Red",
                            "doc_count": 1
                        }
                    ]
                }
            }
        }
    }
}