将嵌套文档中的特定字段作为一个文档进行搜索

时间:2016-01-01 15:30:38

标签: elasticsearch

我有以下结构:

{
    "mappings": {
        "document": {
            "properties": {
                "title": {
                    "type": "string"
                },
                "paragraphs": {
                    "type": "nested",
                    "properties": {
                        "paragraph": {
                            "type" : "object",
                            "properties" : {
                                "content": { "type": "string"},
                                "number":{"type":"integer"}
                            }
                        }
                    }
                }
            }
        }
    }
}

使用这些示例文档

{
    "title":"Dubai seeks cause of massive hotel fire at New Year",
    "paragraphs":[
    {"paragraph": {"number": "1", "content":"Firefighters managed to subdue the blaze, but part of the Address Downtown Hotel is still smouldering."}}, 
    {"paragraph": {"number": "2", "content":"A BBC reporter says a significant fire is still visible on the 20th floor, where the blaze apparently started."}}, 
    {"paragraph": {"number": "3", "content":"The tower was evacuated and 16 people were hurt. But a fireworks show went ahead at the Burj Khalifa tower nearby."}}, 
    {"paragraph": {"number": "4", "content":"The Burj Khalifa is the world's tallest building and an iconic symbol of the United Arab Emirates (UAE)."}}]
}

{
    "title":"Munich not under imminent IS threat",
    "paragraphs":[{"paragraph": {"number": "1", "content":"German officials say there is no sign of any imminent terror attack, after an alert that shut down two Munich railway stations on New Year's Eve."}}]
}

我现在可以使用

搜索每个段落
{ 
    "query": { 
        "nested": { 
            "path": "paragraphs", "query": { 
                "query_string": { 
                    "default_field": "paragraphs.paragraph.content", 
                    "query": "Firefighters AND still" 
                } 
            } 
        }
    }
}

问题:如何查询搜索多个段落但只搜索内容字段的查询?

这样可行,但会搜索所有字段

{
  "query": {
    "query_string": {
      "query": "Firefighters AND apparently AND 1"
    }
  }
}

它与第1段中的 消防员 相匹配,而我想要的第2段中的 显然是 。但我不希望匹配 1 ,因为它不是内容字段。

澄清:第一次搜索会对每个段落执行一次我想要的搜索。但我也希望能够有时搜索整个文件(所有段落)。

解决方案 我添加了“include_in_parent”:true,如https://www.elastic.co/guide/en/elasticsearch/reference/1.7/mapping-nested-type.html

中所述

1 个答案:

答案 0 :(得分:1)

您查询的方式是错误的,因为nested documents是单独索引的。请参阅doc的最后一段。

您的查询

{
  "query": {
    "nested": {
      "path": "paragraphs",
      "query": {
        "query_string": {
          "default_field": "paragraphs.paragraph.content",
          "query": "Firefighters AND apparently"
        }
      }
    }
  }
}

正在寻找相同段中的两个单词,因此您无法获得结果。您需要像这样单独查询它们

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "paragraphs",
            "query": {
              "match": {
                "paragraphs.paragraph.content": "firefighters"
              }
            }
          }
        },
        {
          "nested": {
            "path": "paragraphs",
            "query": {
              "match": {
                "paragraphs.paragraph.content": "apparently"
              }
            }
          }
        }
      ]
    }
  }
}

这将为您提供正确的结果。

作为旁注,我不认为你需要object datatype段内。以下也可以正常工作

"paragraphs": {
      "type": "nested",
      "properties": {
          "content": {
              "type": "string"
          },
          "number": {
              "type": "integer"
          }
      }
  }

希望这会有所帮助!!